Re: Trying to run SparkSQL over Spark Streaming

2015-01-22 Thread nirandap
Hi, 

I'm also trying to use the insertInto method, but end up getting the
assertion error

Is there any workaround to this?? 

rgds



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Trying-to-run-SparkSQL-over-Spark-Streaming-tp12530p21316.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Trying to run SparkSQL over Spark Streaming

2014-08-28 Thread praveshjain1991
Thanks for the reply. Sorry I could not ask more earlier.

Trying to use a parquet file is not working at all.

case class Rec(name:String,pv:Int)
val sqlContext=new org.apache.spark.sql.SQLContext(sc)

import sqlContext.createSchemaRDD
val d1=sc.parallelize(Array((a,10),(b,3))).map(e=Rec(e._1,e._2))
d1.saveAsParquetFile(p1.parquet)
val
d2=sc.parallelize(Array((a,10),(b,3),(c,5))).map(e=Rec(e._1,e._2))
d2.saveAsParquetFile(p2.parquet)
val f1=sqlContext.parquetFile(p1.parquet)
val f2=sqlContext.parquetFile(p2.parquet)
f1.registerAsTable(logs)
f2.insertInto(logs)

is giving the error :

 java.lang.AssertionError: assertion failed: No plan for InsertIntoTable
Map(), false 

same as the one that occured while trying to insert into from rdd to table.
So i guess inserting into parquet tables is also not supported?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Trying-to-run-SparkSQL-over-Spark-Streaming-tp12530p13004.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Trying to run SparkSQL over Spark Streaming

2014-08-26 Thread praveshjain1991
Thanks for the reply.

Ya it doesn't seem doable straight away. Someone suggested this

/For each of your streams, first create an emty RDD that you register as a
table, obtaining an empty table. For your example, let's say you call it
allTeenagers.

Then, for each of your queries, use SchemaRDD's insertInto method to add the
result to that table:

teenagers.insertInto(allTeenagers)

If you do this with both your streams, creating two separate accumulation
tables, you can then join them using a plain old SQL query.
/

So I was trying it but can't seem to use the insertInto method in the
correct way. Something like:

var p1 = Person(Hari,22)
val rdd1 = sc.parallelize(Array(p1))
rdd1.registerAsTable(data)

var p2 = Person(sagar, 22)
var rdd2 = sc.parallelize(Array(p2))
rdd2.insertInto(data)

is giving the error : java.lang.AssertionError: assertion failed: No plan
for InsertIntoTable Map(), false

Any thoughts?

Thanks

Hi again,

On Tue, Aug 26, 2014 at 10:13 AM, Tobias Pfeiffer lt;tgp@gt; wrote:

 On Mon, Aug 25, 2014 at 7:11 PM, praveshjain1991 
 praveshjain1991@ wrote:

  If you want to issue an SQL statement on streaming data, you must have
 both
 the registerAsTable() and the sql() call *within* the foreachRDD(...)
 block,
 or -- as you experienced -- the table name will be unknown

 Since this is the case then is there any way to run join over data
 received
 from two different streams?


 Couldn't you do dstream1.join(dstream2).foreachRDD(...)?


 Ah, I guess you meant something like SELECT * FROM dstream1 JOIN dstream2
WHERE ...? I don't know if that is possible. Doesn't seem easy to me; I
don't think that's doable with the current codebase...

Tobias




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Trying-to-run-SparkSQL-over-Spark-Streaming-tp12530p12812.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Trying to run SparkSQL over Spark Streaming

2014-08-25 Thread praveshjain1991
Hi,

Thanks for your help the other day. I had one more question regarding the
same.

If you want to issue an SQL statement on streaming data, you must have both
the registerAsTable() and the sql() call *within* the foreachRDD(...) block,
or -- as you experienced -- the table name will be unknown

Since this is the case then is there any way to run join over data received
from two different streams?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Trying-to-run-SparkSQL-over-Spark-Streaming-tp12530p12739.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Trying to run SparkSQL over Spark Streaming

2014-08-25 Thread Tobias Pfeiffer
Hi,


On Mon, Aug 25, 2014 at 7:11 PM, praveshjain1991 praveshjain1...@gmail.com
wrote:

 If you want to issue an SQL statement on streaming data, you must have
 both
 the registerAsTable() and the sql() call *within* the foreachRDD(...)
 block,
 or -- as you experienced -- the table name will be unknown

 Since this is the case then is there any way to run join over data received
 from two different streams?


Couldn't you do dstream1.join(dstream2).foreachRDD(...)?

Tobias


Re: Trying to run SparkSQL over Spark Streaming

2014-08-25 Thread Tobias Pfeiffer
Hi again,

On Tue, Aug 26, 2014 at 10:13 AM, Tobias Pfeiffer t...@preferred.jp wrote:

 On Mon, Aug 25, 2014 at 7:11 PM, praveshjain1991 
 praveshjain1...@gmail.com wrote:

  If you want to issue an SQL statement on streaming data, you must have
 both
 the registerAsTable() and the sql() call *within* the foreachRDD(...)
 block,
 or -- as you experienced -- the table name will be unknown

 Since this is the case then is there any way to run join over data
 received
 from two different streams?


 Couldn't you do dstream1.join(dstream2).foreachRDD(...)?


 Ah, I guess you meant something like SELECT * FROM dstream1 JOIN dstream2
WHERE ...? I don't know if that is possible. Doesn't seem easy to me; I
don't think that's doable with the current codebase...

Tobias


Re: Trying to run SparkSQL over Spark Streaming

2014-08-21 Thread praveshjain1991
Hi 

Thanks for the reply and the link. Its working now.

From the discussion on the link, I understand that there are some
shortcomings while using SQL over streaming. 

The part that you mentioned */the variable `result ` is of type
DStream[Row]. That is, the meta-information from the SchemaRDD is lost and,
from what I understand, there is then no way to learn about the column names
of the returned data, as this information is only encoded in the
SchemaRDD/* 
Why is this bad??



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Trying-to-run-SparkSQL-over-Spark-Streaming-tp12530p12535.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Trying to run SparkSQL over Spark Streaming

2014-08-21 Thread Tobias Pfeiffer
Hi,


On Thu, Aug 21, 2014 at 3:11 PM, praveshjain1991 praveshjain1...@gmail.com
wrote:

 The part that you mentioned */the variable `result ` is of type
 DStream[Row]. That is, the meta-information from the SchemaRDD is lost and,
 from what I understand, there is then no way to learn about the column
 names
 of the returned data, as this information is only encoded in the
 SchemaRDD/*


Because a DStream[Row] is basically like DStream[Array[Object]]. You lose
all the information about data types in your result and there is no way to
recover it, once the schema is inaccessible. If you want to process the
data later on, you will have to check types and make assertions about the
statements that were issued before etc.

Tobias


Re: Trying to run SparkSQL over Spark Streaming

2014-08-21 Thread praveshjain1991
Oh right. Got it. Thanks

Also found this link on that discussion:
https://github.com/thunderain-project/StreamSQL

Does this provide more features than Spark?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Trying-to-run-SparkSQL-over-Spark-Streaming-tp12530p12538.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: Trying to run SparkSQL over Spark Streaming

2014-08-21 Thread Shao, Saisai
Hi,

StreamSQL (https://github.com/thunderain-project/StreamSQL) is a POC project 
based on Spark to combine the power of Catalyst and Spark Streaming, to offer 
people the ability to manipulate SQL on top of DStream as you wanted, this keep 
the same semantics with SparkSQL as offer a SchemaDStream on top of DStream. 
You don't need to do tricky thing like extracting rdd to register as a table. 
Besides other parts are the same as Spark.

Thanks
Jerry

-Original Message-
From: praveshjain1991 [mailto:praveshjain1...@gmail.com] 
Sent: Thursday, August 21, 2014 2:25 PM
To: u...@spark.incubator.apache.org
Subject: Re: Trying to run SparkSQL over Spark Streaming

Oh right. Got it. Thanks

Also found this link on that discussion:
https://github.com/thunderain-project/StreamSQL

Does this provide more features than Spark?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Trying-to-run-SparkSQL-over-Spark-Streaming-tp12530p12538.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Trying to run SparkSQL over Spark Streaming

2014-08-20 Thread Tobias Pfeiffer
Hi,


On Thu, Aug 21, 2014 at 2:19 PM, praveshjain1991 praveshjain1...@gmail.com
wrote:

 Using Spark SQL with batch data works fine so I'm thinking it has to do
 with
 how I'm calling streamingcontext.start(). Any ideas what is the issue? Here
 is the code:


Please have a look at


http://apache-spark-user-list.1001560.n3.nabble.com/Some-question-about-SQL-and-streaming-tp9229p9254.html

If you want to issue an SQL statement on streaming data, you must have both
the registerAsTable() and the sql() call *within* the foreachRDD(...)
block, or -- as you experienced -- the table name will be unknown.

Tobias