Re: spark streaming socket read issue

2017-06-30 Thread Shixiong(Ryan) Zhu
Could you show the codes that start the StreamingQuery from Dataset?. If you don't call `writeStream.start(...)`, it won't run anything. On Fri, Jun 30, 2017 at 6:47 AM, pradeepbill wrote: > hi there, I have a spark streaming issue that i am not able to figure out ,

Re: Interesting Stateful Streaming question

2017-06-30 Thread Michael Armbrust
This does sound like a good use case for that feature. Note that Spark 2.2. adds a similar [flat]MapGroupsWithState operation to structured streaming. Stay tuned for a blog post on that! On Thu, Jun 29, 2017 at 6:11 PM, kant kodali wrote: > Is mapWithState an answer for

Re: Withcolumn date with sysdate

2017-06-30 Thread Pralabh Kumar
put default value inside lit df.withcolumn("date",lit("constant value")) On Fri, Jun 30, 2017 at 10:20 PM, sudhir k wrote: > Can we add a column to dataframe with a default value like sysdate .. I am > calling my udf but it is throwing error col expected . > > On spark

Withcolumn date with sysdate

2017-06-30 Thread sudhir k
Can we add a column to dataframe with a default value like sysdate .. I am calling my udf but it is throwing error col expected . On spark shell df.withcolumn("date",curent_date) works I need similiar for scala program which I can build in a jar Thanks, Sudhir -- Sent from Gmail Mobile

Re: PySpark working with Generators

2017-06-30 Thread Jörn Franke
In this case i do not see so many benefits of using Spark. Is the data volume high? Alternatively i recommend to convert the proprietary format into a format Sparks understand and then use this format in Spark. Another alternative would be to write a custom Spark datasource. Even your

Re: PySpark working with Generators

2017-06-30 Thread Saatvik Shah
Hi Mahesh and Ayan, The files I'm working with are a very complex proprietary format, for whom I only have access to a reader function as I had described earlier which only accepts a path to a local file system. This rules out sc.wholeTextFile - since I cannot pass the contents of wholeTextFile

spark streaming socket read issue

2017-06-30 Thread pradeepbill
hi there, I have a spark streaming issue that i am not able to figure out , below code reads from a socket, but I don't see any input going into the job, I have nc -l running, and dumping data though, not sure why my spark job is not able to read data from 10.176.110.112:.Please advice.

Re: about broadcast join of base table in spark sql

2017-06-30 Thread Yong Zhang
Or since you already use the DataFrame API, instead of SQL, you can add the broadcast function to force it. https://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/sql/functions.html#broadcast(org.apache.spark.sql.DataFrame) Yong functions - Apache

Re: about broadcast join of base table in spark sql

2017-06-30 Thread Bryan Jeffrey
Hello. If you want to allow broadcast join with larger broadcasts you can set spark.sql.autoBroadcastJoinThreshold to a higher value. This will cause the plan to allow join despite 'A' being larger than the default threshold. Get Outlook for Android From: paleyl Sent:

Spark querying parquet data partitioned in S3

2017-06-30 Thread Francisco Blaya
We have got data stored in S3 partitioned by several columns. Let's say following this hierarchy: s3://bucket/data/column1=X/column2=Y/parquet-files We run a Spark job in a EMR cluster (1 master,3 slaves) and realised the following: A) - When we declare the initial dataframe to be the whole

Fwd: about broadcast join of base table in spark sql

2017-06-30 Thread paleyl
Hi All, Recently I meet a problem in broadcast join: I want to left join table A and B, A is the smaller one and the left table, so I wrote A = A.join(B,A("key1") === B("key2"),"left") but I found that A is not broadcast out, as the shuffle size is still very large. I guess this is a designed