Hello all,
I’m just trying to build a pipeline reading data from a streaming source
and write to orc file. But I don’t see any file that is written to the file
system nor any exceptions
Here is an example
val df = spark.readStream.format(“...")
.option(
“Topic",
"Some
from pyspark.sql import Row
A_DF = sc.parallelize(
[
Row(id='A123', name='a1'),
Row(id='A234', name='a2')
]).toDF()
B_DF = sc.parallelize(
[
Row(id='A123', pid='A234', ename='e1')
]).toDF()
join_df = B_DF.join(A_DF, B_DF.id==A_DF.id).drop(B_DF.id)
I have questions about using pyspark 2.1.1 pushing data to kafka.
I don't see any pyspark streaming api to write data directly to kafka, if
there is one or example, please point me to the right page.
I implemented my own way which using a global kafka producer and push the
data picked from
I have a small application like this
acc = sc.accumulate(5)
def t_f(x,):
global acc
sleep(5)
acc += x
def f(x):
global acc
thread = Thread(target = t_f, args = (x,))
thread.start()
# thread.join() # without this it doesn't work
rdd = sc.parallelize([1,2,4,1])
Anyone has any idea on this?
On Tue, Jul 22, 2014 at 7:02 PM, hsy...@gmail.com hsy...@gmail.com wrote:
But how do they do the interactive sql in the demo?
https://www.youtube.com/watch?v=dJQ5lV5Tldw
And if it can work in the local mode. I think it should be able to work in
cluster mode
Hi guys,
I'm able to run some Spark SQL example but the sql is static in the code. I
would like to know is there a way to read sql from somewhere else (shell
for example)
I could read sql statement from kafka/zookeeper, but I cannot share the sql
to all workers. broadcast seems not working for
in the
code? What do you mean by cannot shar the sql to all workers?
On Tue, Jul 22, 2014 at 4:03 PM, hsy...@gmail.com hsy...@gmail.com
wrote:
Hi guys,
I'm able to run some Spark SQL example but the sql is static in the
code. I
would like to know is there a way to read sql from somewhere
))
})
ssc.start()
ssc.awaitTermination()
On Tue, Jul 22, 2014 at 5:10 PM, Zongheng Yang zonghen...@gmail.com wrote:
Can you paste a small code example to illustrate your questions?
On Tue, Jul 22, 2014 at 5:05 PM, hsy...@gmail.com hsy...@gmail.com
wrote:
Sorry, typo. What I mean
after the StreamingContext has started.
Tobias
On Wed, Jul 23, 2014 at 9:55 AM, hsy...@gmail.com hsy...@gmail.com
wrote:
For example, this is what I tested and work on local mode, what it does
is it get data and sql query both from kafka and do sql on each RDD and
output the result back
I have the same problem
On Sat, Jul 19, 2014 at 12:31 AM, lihu lihu...@gmail.com wrote:
Hi,
Everyone. I have a piece of following code. When I run it,
it occurred the error just like below, it seem that the SparkContext is not
serializable, but i do not try to use the SparkContext
Thanks Tathagata, so can I say RDD size(from the stream) is window size.
and the overlap between 2 adjacent RDDs are sliding size.
But I still don't understand what it batch size, why do we need this since
data processing is RDD by RDD right?
And does spark chop the data into RDDs at the very
When I'm reading the API of spark streaming, I'm confused by the 3
different durations
StreamingContext(conf: SparkConf
http://spark.apache.org/docs/latest/api/scala/org/apache/spark/SparkConf.html
, batchDuration: Duration
reproduce it.
On Mon, Jul 14, 2014 at 7:36 PM, hsy...@gmail.com hsy...@gmail.com
wrote:
Before yarn application -kill If you do jps You'll have a list
of SparkSubmit and ApplicationMaster
After you use yarn applicaton -kill you only kill the SparkSubmit
On Mon, Jul 14, 2014 at 4:29 PM
sure it works, and you see output?
Also, I recommend going through the previous step-by-step approach to
narrow down where the problem is.
TD
On Mon, Jul 14, 2014 at 9:15 PM, hsy...@gmail.com hsy...@gmail.com
wrote:
Actually, I deployed this on yarn cluster(spark-submit) and I couldn't
find
anything in the driver logs!
So try doing a collect, or take on the RDD returned by sql query and print
that.
TD
On Tue, Jul 15, 2014 at 4:28 PM, hsy...@gmail.com hsy...@gmail.com
wrote:
By the way, have you ever run SQL and stream together? Do you know any
example that works? Thanks!
On Tue
Hi all,
A newbie question, I start a spark yarn application through spark-submit
How do I kill this app. I can kill the yarn app by yarn application -kill
appid but the application master is still running. What's the proper way
to shutdown the entire app?
Best,
Siyuan
Hi All,
Couple days ago, I tried to integrate SQL and streaming together. My
understanding is I can transform RDD from Dstream to schemaRDD and execute
SQL on each RDD. But I got no luck
Would you guys help me take a look at my code? Thank you very much!
object KafkaSpark {
def main(args:
. This is what I did 2 hours
ago.
Sorry I cannot provide more help.
Sent from my iPhone
On 14 Jul, 2014, at 6:05 pm, hsy...@gmail.com hsy...@gmail.com wrote:
yarn-cluster
On Mon, Jul 14, 2014 at 2:44 PM, Jerry Lam chiling...@gmail.com wrote:
Hi Siyuan,
I wonder if you --master yarn-cluster
but SQL command throwing error? No errors but no output
either?
TD
On Mon, Jul 14, 2014 at 4:06 PM, hsy...@gmail.com hsy...@gmail.com
wrote:
Hi All,
Couple days ago, I tried to integrate SQL and streaming together. My
understanding is I can transform RDD from Dstream to schemaRDD and execute
that to work, then I would test the Spark SQL
stuff.
TD
On Mon, Jul 14, 2014 at 5:25 PM, hsy...@gmail.com hsy...@gmail.com
wrote:
No errors but no output either... Thanks!
On Mon, Jul 14, 2014 at 4:59 PM, Tathagata Das
tathagata.das1...@gmail.com wrote:
Could you elaborate on what
:21 AM, hsy...@gmail.com hsy...@gmail.com
wrote:
Hi guys,
I'm a new user to spark. I would like to know is there an example of how
to user spark SQL and spark streaming together? My use case is I want to do
some SQL on the input stream from kafka.
Thanks!
Best,
Siyuan
I have a newbie question. What is the difference between SparkSQL and Shark?
Best,
Siyuan
Hi guys,
I'm a new user to spark. I would like to know is there an example of how to
user spark SQL and spark streaming together? My use case is I want to do
some SQL on the input stream from kafka.
Thanks!
Best,
Siyuan
23 matches
Mail list logo