SparkSQL on hive error

2015-10-27 Thread Anand Nalya
Hi, I've a partitioned table in Hive (Avro) that I can query alright from hive cli. When using SparkSQL, I'm able to query some of the partitions, but getting exception on some of the partitions. The query is: sqlContext.sql("select * from myTable where source='http' and date =

Re: Checkpoint file not found

2015-08-03 Thread Anand Nalya
(notFilter).checkpoint(interval) toNotUpdate.foreachRDD(rdd = pending = rdd ) Thanks On 3 August 2015 at 13:09, Tathagata Das t...@databricks.com wrote: Can you tell us more about streaming app? DStream operation that you are using? On Sun, Aug 2, 2015 at 9:14 PM, Anand Nalya anand.na

Checkpoint file not found

2015-08-02 Thread Anand Nalya
Hi, I'm writing a Streaming application in Spark 1.3. After running for some time, I'm getting following execption. I'm sure, that no other process is modifying the hdfs file. Any idea, what might be the cause of this? 15/08/02 21:24:13 ERROR scheduler.DAGSchedulerEventProcessLoop:

Re: updateStateByKey schedule time

2015-07-21 Thread Anand Nalya
I also ran into a similar use case. Is this possible? On 15 July 2015 at 18:12, Michel Hubert mich...@vsnsystemen.nl wrote: Hi, I want to implement a time-out mechanism in de updateStateByKey(…) routine. But is there a way the retrieve the time of the start of the batch

Re: Breaking lineage and reducing stages in Spark Streaming

2015-07-10 Thread Anand Nalya
http://polyglotprogramming.com On Thu, Jul 9, 2015 at 8:06 AM, Anand Nalya anand.na...@gmail.com wrote: Yes, myRDD is outside of DStream. Following is the actual code where newBase and current are the rdds being updated with each batch: val base = sc.textFile... var newBase

Re: Breaking lineage and reducing stages in Spark Streaming

2015-07-09 Thread Anand Nalya
it? On Thu, Jul 9, 2015 at 3:17 AM, Anand Nalya anand.na...@gmail.com wrote: Thats from the Streaming tab for Spark 1.4 WebUI. On 9 July 2015 at 15:35, Michel Hubert mich...@vsnsystemen.nl wrote: Hi, I was just wondering how you generated to second image with the charts. What product

Breaking lineage and reducing stages in Spark Streaming

2015-07-09 Thread Anand Nalya
Hi, I've an application in which an rdd is being updated with tuples coming from RDDs in a DStream with following pattern. dstream.foreachRDD(rdd = { myRDD = myRDD.union(rdd.filter(myfilter)).reduceByKey(_+_) }) I'm using cache() and checkpointin to cache results. Over the time, the lineage

Re: Breaking lineage and reducing stages in Spark Streaming

2015-07-09 Thread Anand Nalya
Thats from the Streaming tab for Spark 1.4 WebUI. On 9 July 2015 at 15:35, Michel Hubert mich...@vsnsystemen.nl wrote: Hi, I was just wondering how you generated to second image with the charts. What product? *From:* Anand Nalya [mailto:anand.na...@gmail.com] *Sent:* donderdag 9

Re: Breaking lineage and reducing stages in Spark Streaming

2015-07-09 Thread Anand Nalya
/0636920033073.do (O'Reilly) Typesafe http://typesafe.com @deanwampler http://twitter.com/deanwampler http://polyglotprogramming.com On Thu, Jul 9, 2015 at 5:56 AM, Anand Nalya anand.na...@gmail.com wrote: The data coming from dstream have the same keys that are in myRDD, so the reduceByKey after

[no subject]

2015-07-07 Thread Anand Nalya
Hi, Suppose I have an RDD that is loaded from some file and then I also have a DStream that has data coming from some stream. I want to keep union some of the tuples from the DStream into my RDD. For this I can use something like this: var myRDD: RDD[(String, Long)] = sc.fromText...

Split RDD into two in a single pass

2015-07-06 Thread Anand Nalya
Hi, I've a RDD which I want to split into two disjoint RDDs on with a boolean function. I can do this with the following val rdd1 = rdd.filter(f) val rdd2 = rdd.filter(fnot) I'm assuming that each of the above statement will traverse the RDD once thus resulting in 2 passes. Is there a way of

Array fields in dataframe.write.jdbc

2015-07-02 Thread Anand Nalya
Hi, I'm using spark 1.4. I've a array field in my data frame and when I'm trying to write this dataframe to postgres, I'm getting the following exception: Exception in thread main java.lang.IllegalArgumentException: Can't translate null value for field