Re: hive.contrib.serde2.RegexSerDe not found

2015-07-28 Thread Gianluca Privitera
Try use: org.apache.hadoop.hive.serde2.RegexSerDe GP On 27 Jul 2015, at 09:35, ZhuGe t...@outlook.commailto:t...@outlook.com wrote: Hi all: I am testing the performance of hive on spark sql. The existing table is created with ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

Spark History Server pointing to S3

2015-06-16 Thread Gianluca Privitera
In Spark website it’s stated in the View After the Fact section (https://spark.apache.org/docs/latest/monitoring.html) that you can point the start-history-server.sh script to a directory in order do view the Web UI using the logs as data source. Is it possible to point that script to S3?

Re: Spark History Server pointing to S3

2015-06-16 Thread Gianluca Privitera
...@sigmoidanalytics.commailto:ak...@sigmoidanalytics.com wrote: Not quiet sure, but try pointing the spark.history.fs.logDirectory to your s3 Thanks Best Regards On Tue, Jun 16, 2015 at 6:26 PM, Gianluca Privitera gianluca.privite...@studio.unibo.itmailto:gianluca.privite...@studio.unibo.it wrote: In Spark

Spark Streaming w/ tshark exception problem on EC2

2014-07-15 Thread Gianluca Privitera
Hi, I’ve got a problem with Spark Streaming and tshark. While I’m running locally I have no problems with this code, but when I run it on a EC2 cluster I get the exception shown just under the code. def dissection(s: String): Seq[String] = { try { Process(hadoop command to create

Re: Where Can I find the full documentation for Spark SQL?

2014-06-25 Thread Gianluca Privitera
You can find something in the API, nothing more than that I think for now. Gianluca On 25 Jun 2014, at 23:36, guxiaobo1982 guxiaobo1...@qq.com wrote: Hi, I want to know the full list of functions, syntax, features that Spark SQL supports, is there some documentations. Regards,

Re: Access DStream content?

2014-06-12 Thread Gianluca Privitera
You can use ForeachRDD then access RDD data. Hope this works for you. Gianluca On 12 Jun 2014, at 10:06, Wolfinger, Fred fwolfin...@cyberpointllc.commailto:fwolfin...@cyberpointllc.com wrote: Good morning. I have a question related to Spark Streaming. I have reduced some data down to a

Re: Increase storage.MemoryStore size

2014-06-12 Thread Gianluca Privitera
If you are launching your application with spark-submit you can manually edit the spark-class file to make it 1g as baseline. It’s pretty easy to do and to figure out how once you open the file. This worked for me even if it’s not a final solution of course. Gianluca On 12 Jun 2014, at 15:16,

Spark Streaming application not working on EC2 Cluster

2014-06-09 Thread Gianluca Privitera
Hi, I'm think I may have encountered some kind of bug that at the moment prevents the correct running of my application on a EC2 Cluster. I'm saying that because the same exact code works wonderfully locally but has a really strange behaviour on the cluster. val uri = ssc.textFileStream(args(1)

Re: Spark Streaming, download a s3 file to run a script shell on it

2014-06-06 Thread Gianluca Privitera
: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Fri, Jun 6, 2014 at 3:00 AM, Gianluca Privitera gianluca.privite...@studio.unibo.it mailto:gianluca.privite...@studio.unibo.it wrote: Hi, I've got a weird question but maybe someone

Spark Streaming window functions bug 1.0.0

2014-06-06 Thread Gianluca Privitera
Is anyone experiencing problems with windows? dstream1.print() val dstream2 = dstream1.groupByKeyAndWindow(Seconds(60)) dstream2.print() In my appslication the first print() prints out all the strings and their keys, but after the window function everything is lost and nothings gets printed.

Spark Streaming, download a s3 file to run a script shell on it

2014-06-05 Thread Gianluca Privitera
Hi, I've got a weird question but maybe someone has already dealt with it. My Spark Streaming application needs to - download a file from a S3 bucket, - run a script with the file as input, - create a DStream from this script output. I've already got the second part done with the rdd.pipe() API

EC2 Simple Cluster

2014-06-02 Thread Gianluca Privitera
Hi everyone, I would like to setup a very simple cluster (specifically using 2 micro instances only) of Spark on EC2 and make it run a simple Spark Streaming application I created. Someone actually managed to do that? Because after launching the scripts from this page: