Could you show the codes that start the StreamingQuery from Dataset?. If
you don't call `writeStream.start(...)`, it won't run anything.
On Fri, Jun 30, 2017 at 6:47 AM, pradeepbill wrote:
> hi there, I have a spark streaming issue that i am not able to figure out ,
This does sound like a good use case for that feature. Note that Spark
2.2. adds a similar [flat]MapGroupsWithState operation to structured
streaming. Stay tuned for a blog post on that!
On Thu, Jun 29, 2017 at 6:11 PM, kant kodali wrote:
> Is mapWithState an answer for
put default value inside lit
df.withcolumn("date",lit("constant value"))
On Fri, Jun 30, 2017 at 10:20 PM, sudhir k wrote:
> Can we add a column to dataframe with a default value like sysdate .. I am
> calling my udf but it is throwing error col expected .
>
> On spark
Can we add a column to dataframe with a default value like sysdate .. I am
calling my udf but it is throwing error col expected .
On spark shell
df.withcolumn("date",curent_date) works I need similiar for scala program
which I can build in a jar
Thanks,
Sudhir
--
Sent from Gmail Mobile
In this case i do not see so many benefits of using Spark. Is the data volume
high?
Alternatively i recommend to convert the proprietary format into a format
Sparks understand and then use this format in Spark.
Another alternative would be to write a custom Spark datasource. Even your
Hi Mahesh and Ayan,
The files I'm working with are a very complex proprietary format, for whom
I only have access to a reader function as I had described earlier which
only accepts a path to a local file system.
This rules out sc.wholeTextFile - since I cannot pass the contents of
wholeTextFile
hi there, I have a spark streaming issue that i am not able to figure out ,
below code reads from a socket, but I don't see any input going into the
job, I have nc -l running, and dumping data though, not sure why my
spark job is not able to read data from 10.176.110.112:.Please advice.
Or since you already use the DataFrame API, instead of SQL, you can add the
broadcast function to force it.
https://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/sql/functions.html#broadcast(org.apache.spark.sql.DataFrame)
Yong
functions - Apache
Hello.
If you want to allow broadcast join with larger broadcasts you can set
spark.sql.autoBroadcastJoinThreshold to a higher value. This will cause the
plan to allow join despite 'A' being larger than the default threshold.
Get Outlook for Android
From: paleyl
Sent:
We have got data stored in S3 partitioned by several columns. Let's say
following this hierarchy:
s3://bucket/data/column1=X/column2=Y/parquet-files
We run a Spark job in a EMR cluster (1 master,3 slaves) and realised the
following:
A) - When we declare the initial dataframe to be the whole
Hi All,
Recently I meet a problem in broadcast join: I want to left join table A
and B, A is the smaller one and the left table, so I wrote
A = A.join(B,A("key1") === B("key2"),"left")
but I found that A is not broadcast out, as the shuffle size is still very
large.
I guess this is a designed
11 matches
Mail list logo