I don't have akka but with just Spark, I just edited log4j.properties to
"log4j.rootCategory=ERROR, console" and ran the following command and was
able to get only the Time row as output.
run-example org.apache.spark.examples.streaming.JavaNetworkWordCount
localhost
--
View this message
This is a little bit old thread but in case other users may still want to
know the answer, check the following page. The property is set in
conf/spark-env.sh
http://arjon.es/2014/04/14/how-to-change-default-serializer-on-apache-spark-shell/
--
View this message in context:
I think that example is included with Spark. The source code is included in
examples/src/main/java/org/apache/spark/examples/sql
It can be execute with the following command:
./bin/run-example org.apache.spark.examples.sql.JavaSparkSQL
--
View this message in context:
If Kryo usage is recommended, why is Java serialization the default
serializer instead of Kryo? Is there some limitation to using Kryo? I've
read through the documentation but it just seem Kryo is a better choice and
should be made a default.
--
View this message in context:
I'm not sure if following will work with Parquet output but have you tried
setting sc.setLogLevel("ERROR") or setting log levels in spark's
log4j.properties file?
--
View this message in context:
I'm not getting your question about scheduling. Did you create a Spark
application and asking how to schedule it to run? Are you going to output
results from the scheduled run in hdfs and join them in the first chain with
the real time result?
--
View this message in context:
I think it depends on the versions. Using something like 0.9.2 and 1.5.1
isn't recommended.
1.5.0 and 1.5.1 is a minor bug release so I think most will work but some
feature may behave differently so it's better to use the same revision.
Changes between versions/releases are listed in CHANGES.txt
You can save the result to a storage (e.g. Hive) and have a web application
read data from that.
I think there's also a "toJSON" method to convert Dataset to JSON.
Another option is to use something like Spark Kernel with Spark
sc(https://github.com/ibm-et/spark-kernel/wiki)
Another choice is to
It depends on how much data needs to be processed. Data Warehouse with
indexes is going to be faster when there is not much data. If you have big
data, Spark Streaming and may be Spark SQL may interest you.
--
View this message in context:
Are you sure you downloaded the pre-build version? The default is source
build package.
Please check if the file you've downloaded starts "spark-1.5.1-bin-" with a
"bin".
--
View this message in context:
There's a document describing the format of files in the parent directory. It
seems like a fixed width file.
ftp://ftp.ncdc.noaa.gov/pub/data/noaa/ish-format-document.pdf
--
View this message in context:
I'm a little bit late but posting in case somebody googles this.
It seems saveAsTextFile requires chmod 777 but the local directory won't
default to give w to other users.
I've tried saving to a mounted drive and was able to save without an error.
Without the the "file", it won't save to the
Have you tried using spark-csv?
https://github.com/databricks/spark-csv
e.g.
hiveSQLContext.sql("FROM employee SELECT name, city,
state").write.format("com.databricks.spark.csv").save("employee.csv")
--
View this message in context:
I have Spark 1.5.1 running directly on Windows7 but would like to run it on
console2.
I have JAVA_HOME, SCALA_HOME, and SPARK_HOME setup and have verified Java
and Scala are working property (did a -version and able to run programs).
However, when I try to use Spark using "spark-shell" it return
14 matches
Mail list logo