Re: How to configure logging...

2015-11-10 Thread Hitoshi
I don't have akka but with just Spark, I just edited log4j.properties to "log4j.rootCategory=ERROR, console" and ran the following command and was able to get only the Time row as output. run-example org.apache.spark.examples.streaming.JavaNetworkWordCount localhost -- View this message

Re: kryos serializer

2015-11-09 Thread Hitoshi Ozawa
This is a little bit old thread but in case other users may still want to know the answer, check the following page. The property is set in conf/spark-env.sh http://arjon.es/2014/04/14/how-to-change-default-serializer-on-apache-spark-shell/ -- View this message in context:

Re: java.lang.NoSuchMethodError: org.apache.spark.ui.SparkUI.addStaticHandler(Ljava/lang/String;Ljava/lang/String;

2015-11-09 Thread Hitoshi Ozawa
I think that example is included with Spark. The source code is included in examples/src/main/java/org/apache/spark/examples/sql It can be execute with the following command: ./bin/run-example org.apache.spark.examples.sql.JavaSparkSQL -- View this message in context:

Why is Kryo not the default serializer?

2015-11-09 Thread Hitoshi Ozawa
If Kryo usage is recommended, why is Java serialization the default serializer instead of Kryo? Is there some limitation to using Kryo? I've read through the documentation but it just seem Kryo is a better choice and should be made a default. -- View this message in context:

Re: Unwanted SysOuts in Spark Parquet

2015-11-09 Thread Hitoshi Ozawa
I'm not sure if following will work with Parquet output but have you tried setting sc.setLogLevel("ERROR") or setting log levels in spark's log4j.properties file? -- View this message in context:

Re: Scheduling Spark process

2015-11-08 Thread Hitoshi Ozawa
I'm not getting your question about scheduling. Did you create a Spark application and asking how to schedule it to run? Are you going to output results from the scheduled run in hdfs and join them in the first chain with the real time result? -- View this message in context:

Re: Does the Standalone cluster and Applications need to be same Spark version?

2015-11-08 Thread Hitoshi Ozawa
I think it depends on the versions. Using something like 0.9.2 and 1.5.1 isn't recommended. 1.5.0 and 1.5.1 is a minor bug release so I think most will work but some feature may behave differently so it's better to use the same revision. Changes between versions/releases are listed in CHANGES.txt

Re: visualizations using the apache spark

2015-11-08 Thread Hitoshi Ozawa
You can save the result to a storage (e.g. Hive) and have a web application read data from that. I think there's also a "toJSON" method to convert Dataset to JSON. Another option is to use something like Spark Kernel with Spark sc(https://github.com/ibm-et/spark-kernel/wiki) Another choice is to

Re: Is SPARK is the right choice for traditional OLAP query processing?

2015-11-08 Thread Hitoshi Ozawa
It depends on how much data needs to be processed. Data Warehouse with indexes is going to be faster when there is not much data. If you have big data, Spark Streaming and may be Spark SQL may interest you. -- View this message in context:

Re: why prebuild spark 1.5.1 still say Failed to find Spark assembly in

2015-11-08 Thread Hitoshi Ozawa
Are you sure you downloaded the pre-build version? The default is source build package. Please check if the file you've downloaded starts "spark-1.5.1-bin-" with a "bin". -- View this message in context:

Re: How to analyze weather data in Spark?

2015-11-08 Thread Hitoshi Ozawa
There's a document describing the format of files in the parent directory. It seems like a fixed width file. ftp://ftp.ncdc.noaa.gov/pub/data/noaa/ish-format-document.pdf -- View this message in context:

Re: Failed to save RDD as text file to local file system

2015-11-05 Thread Hitoshi Ozawa
I'm a little bit late but posting in case somebody googles this. It seems saveAsTextFile requires chmod 777 but the local directory won't default to give w to other users. I've tried saving to a mounted drive and was able to save without an error. Without the the "file", it won't save to the

Re: Dump table into file

2015-11-05 Thread Hitoshi Ozawa
Have you tried using spark-csv? https://github.com/databricks/spark-csv e.g. hiveSQLContext.sql("FROM employee SELECT name, city, state").write.format("com.databricks.spark.csv").save("employee.csv") -- View this message in context:

Running Apache Spark 1.5.1 on console2

2015-11-04 Thread Hitoshi Ozawa
I have Spark 1.5.1 running directly on Windows7 but would like to run it on console2. I have JAVA_HOME, SCALA_HOME, and SPARK_HOME setup and have verified Java and Scala are working property (did a -version and able to run programs). However, when I try to use Spark using "spark-shell" it return