Re: Trying to run SparkSQL over Spark Streaming

2014-08-28 Thread praveshjain1991
Thanks for the reply. Sorry I could not ask more earlier. Trying to use a parquet file is not working at all. case class Rec(name:String,pv:Int) val sqlContext=new org.apache.spark.sql.SQLContext(sc) import sqlContext.createSchemaRDD val

Re: Trying to run SparkSQL over Spark Streaming

2014-08-26 Thread praveshjain1991
: No plan for InsertIntoTable Map(), false Any thoughts? Thanks Hi again, On Tue, Aug 26, 2014 at 10:13 AM, Tobias Pfeiffer lt;tgp@gt; wrote: On Mon, Aug 25, 2014 at 7:11 PM, praveshjain1991 praveshjain1991@ wrote: If you want to issue an SQL statement on streaming data, you must have

Re: Trying to run SparkSQL over Spark Streaming

2014-08-25 Thread praveshjain1991
Hi, Thanks for your help the other day. I had one more question regarding the same. If you want to issue an SQL statement on streaming data, you must have both the registerAsTable() and the sql() call *within* the foreachRDD(...) block, or -- as you experienced -- the table name will be unknown

Re: Trying to run SparkSQL over Spark Streaming

2014-08-21 Thread praveshjain1991
Hi Thanks for the reply and the link. Its working now. From the discussion on the link, I understand that there are some shortcomings while using SQL over streaming. The part that you mentioned */the variable `result ` is of type DStream[Row]. That is, the meta-information from the SchemaRDD

Re: Trying to run SparkSQL over Spark Streaming

2014-08-21 Thread praveshjain1991
Oh right. Got it. Thanks Also found this link on that discussion: https://github.com/thunderain-project/StreamSQL Does this provide more features than Spark? -- View this message in context:

Trying to run SparkSQL over Spark Streaming

2014-08-20 Thread praveshjain1991
I am trying to run SQL queries over streaming data in spark. This looks pretty straight forward but when I try it, I get the error table not found : tablename. It unable to find the table I've registered. Using Spark SQL with batch data works fine so I'm thinking it has to do with how I'm calling

Re: Spark Streaming not processing file with particular number of entries

2014-06-13 Thread praveshjain1991
There doesn't seem to be any obvious reason - that's why it looks like a bug. The .4 million file is present in the directory when the context is started - same as for all other files (which are processed just fine by the application). In the logs we can see that the file is being picked up by

Re: Spark Streaming not processing file with particular number of entries

2014-06-13 Thread praveshjain1991
If you look at the file 400k.output, you'll see the string file:/newdisk1/praveshj/pravesh/data/input/testing4lk.txt This file contains 0.4 mn records. So the file is being picked up but the app goes on to hang later on. Also you mentioned the term Standalone cluster in your previous reply

Re: Spark Streaming not processing file with particular number of entries

2014-06-11 Thread praveshjain1991
Well i was able to get it to work by running spark over mesos. But it looks like a bug while running spark alone. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-not-processing-file-with-particular-number-of-entries-tp6694p7382.html Sent

Shark over Spark-Streaming

2014-06-10 Thread praveshjain1991
Is it possible to use Shark over Streaming data? I did not find any mention of that on the website. When you run shark it gives you a shell to run your queries for stored data. Is there any way to do the same over streaming data? -- Thanks -- View this message in context:

Re: Spark Streaming not processing file with particular number of entries

2014-06-06 Thread praveshjain1991
Hi, I am using Spark-1.0.0 over a 3 node cluster with 1 master and 2 slaves. I am trying to run LR algorithm over Spark Streaming. package org.apache.spark.examples.streaming; import java.io.BufferedReader; import java.io.BufferedWriter; import java.io.FileWriter; import

Re: Spark not working with mesos

2014-06-05 Thread praveshjain1991
Hi Ajatix. Yes the HADOOP_HOME is set on the nodes and i did update the bash. As I said, adding MESOS_HADOOP_HOME did not work. But what is causing the original error : Java.lang.Error: java.io.IOException: failure to login ? -- Thanks -- View this message in context:

Re: Spark Streaming not processing file with particular number of entries

2014-06-05 Thread praveshjain1991
The same issue persists in spark-1.0.0 as well (was using 0.9.1 earlier). Any suggestions are welcomed. -- Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-not-processing-file-with-particular-number-of-entries-tp6694p7056.html Sent

Re: Spark not working with mesos

2014-06-04 Thread praveshjain1991
Thanks for the reply Akhil. I created a tar.gz of created by make-distribution.sh which is accessible from all the slaves (I checked it using hadoop fs -ls /path/). Also there are no worker logs printed in $SPARK_HOME/work/ directory on the workers (which are otherwise printed if i run without

Re: Using String Dataset for Logistic Regression

2014-06-03 Thread praveshjain1991
I am not sure. I have just been using some numerical datasets. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-String-Dataset-for-Logistic-Regression-tp5523p6784.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Spark not working with mesos

2014-06-03 Thread praveshjain1991
I set up Spark-0.9.1 to run on mesos-0.13.0 using the steps mentioned here https://spark.apache.org/docs/0.9.1/running-on-mesos.html . The Mesos UI is showing two workers registered. I want to run these commands on Spark-shell scala val data = 1 to 1 data: