Re: Trying to run SparkSQL over Spark Streaming

2014-08-28 Thread praveshjain1991
Thanks for the reply. Sorry I could not ask more earlier. Trying to use a parquet file is not working at all. case class Rec(name:String,pv:Int) val sqlContext=new org.apache.spark.sql.SQLContext(sc) import sqlContext.createSchemaRDD val d1=sc.parallelize(Array(("a",10),("b",3))).map(e=>Rec(e._1

Re: Trying to run SparkSQL over Spark Streaming

2014-08-26 Thread praveshjain1991
d2.insertInto("data") is giving the error : "java.lang.AssertionError: assertion failed: No plan for InsertIntoTable Map(), false" Any thoughts? Thanks Hi again, On Tue, Aug 26, 2014 at 10:13 AM, Tobias Pfeiffer <tgp@> wrote: > > On Mon, Aug 25, 2014 at 7:11 PM, praves

Spark SQL insertInto

2014-08-26 Thread praveshjain1991
I'm using SparkSql for querying. I'm trying something like: val sqc = new SQLContext(sc); import sqc.createSchemaRDD var p1 = Person("Hari",22) val rdd1 = sc.parallelize(Array(p1)) rdd1.registerAsTable("data") var p2 = Person("sagar", 22) var rdd2 = sc.parallelize(Array(p2)) rdd2.insertInto(

Re: Trying to run SparkSQL over Spark Streaming

2014-08-25 Thread praveshjain1991
Hi, Thanks for your help the other day. I had one more question regarding the same. "If you want to issue an SQL statement on streaming data, you must have both the registerAsTable() and the sql() call *within* the foreachRDD(...) block, or -- as you experienced -- the table name will be unknown"

Re: Trying to run SparkSQL over Spark Streaming

2014-08-20 Thread praveshjain1991
Oh right. Got it. Thanks Also found this link on that discussion: https://github.com/thunderain-project/StreamSQL Does this provide more features than Spark? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Trying-to-run-SparkSQL-over-Spark-Streaming-tp1253

Re: Trying to run SparkSQL over Spark Streaming

2014-08-20 Thread praveshjain1991
Hi Thanks for the reply and the link. Its working now. >From the discussion on the link, I understand that there are some shortcomings while using SQL over streaming. The part that you mentioned "*/the variable `result ` is of type DStream[Row]. That is, the meta-information from the SchemaRDD

Trying to run SparkSQL over Spark Streaming

2014-08-20 Thread praveshjain1991
I am trying to run SQL queries over streaming data in spark. This looks pretty straight forward but when I try it, I get the error table not found : tablename>. It unable to find the table I've registered. Using Spark SQL with batch data works fine so I'm thinking it has to do with how I'm calling

Re: Spark Streaming not processing file with particular number of entries

2014-06-13 Thread praveshjain1991
If you look at the file 400k.output, you'll see the string file:/newdisk1/praveshj/pravesh/data/input/testing4lk.txt This file contains 0.4 mn records. So the file is being picked up but the app goes on to hang later on. Also you mentioned the term "Standalone cluster" in your previous reply

Re: Spark Streaming not processing file with particular number of entries

2014-06-13 Thread praveshjain1991
There doesn't seem to be any obvious reason - that's why it looks like a bug. The .4 million file is present in the directory when the context is started - same as for all other files (which are processed just fine by the application). In the logs we can see that the file is being picked up by the

Re: Spark Streaming not processing file with particular number of entries

2014-06-10 Thread praveshjain1991
Well i was able to get it to work by running spark over mesos. But it looks like a bug while running spark alone. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-not-processing-file-with-particular-number-of-entries-tp6694p7382.html Sent from

Shark over Spark-Streaming

2014-06-10 Thread praveshjain1991
Is it possible to use Shark over Streaming data? I did not find any mention of that on the website. When you run shark it gives you a shell to run your queries for stored data. Is there any way to do the same over streaming data? -- Thanks -- View this message in context: http://apache-spark-

Re: Spark Streaming not processing file with particular number of entries

2014-06-05 Thread praveshjain1991
Hi, I am using Spark-1.0.0 over a 3 node cluster with 1 master and 2 slaves. I am trying to run LR algorithm over Spark Streaming. package org.apache.spark.examples.streaming; import java.io.BufferedReader; import java.io.BufferedWriter; import java.io.FileWriter; import jav

Re: Spark Streaming not processing file with particular number of entries

2014-06-05 Thread praveshjain1991
The same issue persists in spark-1.0.0 as well (was using 0.9.1 earlier). Any suggestions are welcomed. -- Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-not-processing-file-with-particular-number-of-entries-tp6694p7056.html Sent fr

Re: Spark not working with mesos

2014-06-05 Thread praveshjain1991
Hi Ajatix. Yes the HADOOP_HOME is set on the nodes and i did update the bash. As I said, adding MESOS_HADOOP_HOME did not work. But what is causing the original error : "Java.lang.Error: java.io.IOException: failure to login " ? -- Thanks -- View this message in context: http://apache-spa

Re: Spark not working with mesos

2014-06-04 Thread praveshjain1991
Thanks for the reply Ajatix. Adding MESOS_HADOOP_HOME to my .bashrc gives an error while trying to start mesos-master: Failed to load unknown flag 'hadoop_home' Usage: lt-mesos-master [...] Couldn't get any help on this from google. Any suggestions? -- Thanks. -- View this message in contex

Re: Spark not working with mesos

2014-06-04 Thread praveshjain1991
Thanks for the reply Akhil I saw the logs in /tmp/mesos and found that my tar.gz was not properly created. I corrected that but now got another error which i can't find an answer for on google. The error is pretty much the same "org.apache.spark.SparkException: Job aborted: Task 0.0:6 failed 4 ti

Re: Spark not working with mesos

2014-06-04 Thread praveshjain1991
Thanks for the reply Akhil. I created a tar.gz of created by make-distribution.sh which is accessible from all the slaves (I checked it using hadoop fs -ls /path/). Also there are no worker logs printed in $SPARK_HOME/work/ directory on the workers (which are otherwise printed if i run without usin

Spark not working with mesos

2014-06-03 Thread praveshjain1991
I set up Spark-0.9.1 to run on mesos-0.13.0 using the steps mentioned here . The Mesos UI is showing two workers registered. I want to run these commands on Spark-shell > scala> val data = 1 to 1 data: > scala.collection.immutable.R

Re: Using String Dataset for Logistic Regression

2014-06-02 Thread praveshjain1991
I am not sure. I have just been using some numerical datasets. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-String-Dataset-for-Logistic-Regression-tp5523p6784.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Using String Dataset for Logistic Regression

2014-06-02 Thread praveshjain1991
Thank you for your replies. I've now been using integer datasets but ran into another issue. http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-not-processing-file-with-particular-number-of-entries-td6694.html Any ideas? -- Thanks -- View this message in context: http://apac

Spark Streaming not processing file with particular number of entries

2014-06-02 Thread praveshjain1991
Hi, I am using spark-streaming application to process some data over a 3 node cluster. It is, however, not processing any file that contains 0.4 million entires. Files with any other number of entries are processed fine. When running in local mode, even the 0.4 million entries file is processed fi

Re: Using String Dataset for Logistic Regression

2014-05-16 Thread praveshjain1991
Thank you for your reply. So i take it that there's no direct way of using String datasets while using LR in Spark. -Pravesh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-String-Dataset-for-Logistic-Regression-tp5523p5810.html Sent from the Apache

Using String Dataset for Logistic Regression

2014-05-15 Thread praveshjain1991
I have been trying to use LR in Spark's Java API. I used the dataset given along with Spark for the training and testing purposes. Now i want to use it on another dataset that contains string values along with numbers. Is there any way to do this? I am attaching the Dataset that i want to use. T