How to know that a partition is ready when using Structured Streaming

2019-01-16 Thread Wayne Guo
When using structured streaming, we use "partitionBy" api to partition the output data, and use the watermark based on event-time to handle delay records, but how to tell downstream users that a partition is ready? For example, when to write an empty "hadoop.done" file in a paritition directory?

Subscribe

2019-01-16 Thread Vasu Devan

Re: How to force-quit a Spark application?

2019-01-16 Thread Marcelo Vanzin
Those are daemon threads and not the cause of the problem. The main thread is waiting for the "org.apache.hadoop.util.ShutdownHookManager" thread, but I don't see that one in your list. On Wed, Jan 16, 2019 at 12:08 PM Pola Yao wrote: > > Hi Marcelo, > > Thanks for your response. > > I have

Re: How to force-quit a Spark application?

2019-01-16 Thread Pola Yao
Hi Marcelo, Thanks for your response. I have dumped the threads on the server where I submitted the spark application: ''' ... "dispatcher-event-loop-2" #28 daemon prio=5 os_prio=0 tid=0x7f56cee0e000 nid=0x1cb6 waiting on condition [0x7f5699811000] java.lang.Thread.State: WAITING

Re: How to force-quit a Spark application?

2019-01-16 Thread Marcelo Vanzin
If System.exit() doesn't work, you may have a bigger problem somewhere. Check your threads (using e.g. jstack) to see what's going on. On Wed, Jan 16, 2019 at 8:09 AM Pola Yao wrote: > > Hi Marcelo, > > Thanks for your reply! It made sense to me. However, I've tried many ways to > exit the

Re: How to force-quit a Spark application?

2019-01-16 Thread Pola Yao
Hi Marcelo, Thanks for your reply! It made sense to me. However, I've tried many ways to exit the spark (e.g., System.exit()), but failed. Is there an explicit way to shutdown all the alive threads in the spark application and then quit afterwards? On Tue, Jan 15, 2019 at 2:38 PM Marcelo Vanzin

Re: How to unsubscribe???

2019-01-16 Thread Trevor News
Hi Junior, After you send an email to user-unsubscr...@spark.apache.org, you should receive a email with Instructions to double confirm. You will be asked to send another email using the link in the 2nd email. Only when that step is complete will the unsubscribe take effect. Please check your

Re: [ANNOUNCE] Announcing Apache Spark 2.2.3

2019-01-16 Thread Takeshi Yamamuro
Thanks, Dongjoon! On Wed, Jan 16, 2019 at 5:23 PM Hyukjin Kwon wrote: > Nice! > > 2019년 1월 16일 (수) 오전 11:55, Jiaan Geng 님이 작성: > >> Glad to hear this. >> >> >> >> -- >> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >> >>

Re: cache table vs. parquet table performance

2019-01-16 Thread Jörn Franke
I believe the in-memory solution misses the storage indexes that parquet / orc have. The in-memory solution is more suitable if you iterate in the whole set of data frequently. > Am 15.01.2019 um 19:20 schrieb Tomas Bartalos : > > Hello, > > I'm using spark-thrift server and I'm searching

Re: cache table vs. parquet table performance

2019-01-16 Thread Todd Nist
Hi Tomas, Have you considered using something like https://www.alluxio.org/ for you cache? Seems like a possible solution for what your trying to do. -Todd On Tue, Jan 15, 2019 at 11:24 PM 大啊 wrote: > Hi ,Tomas. > Thanks for your question give me some prompt.But the best way use cache >

[Spark SQL]: how is “Exchange hashpartitioning” working in spark

2019-01-16 Thread nkx
Hi, I have a dataset which I want to write sorted into parquet files for getting benefit of requesting these files afterwards over Spark including Predicate Pushdown. Currently I used repartition by column and the number of partitions to move the data to the particular partition. The column is

Re: [ANNOUNCE] Announcing Apache Spark 2.2.3

2019-01-16 Thread Hyukjin Kwon
Nice! 2019년 1월 16일 (수) 오전 11:55, Jiaan Geng 님이 작성: > Glad to hear this. > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >

How to unsubscribe???

2019-01-16 Thread Junior Alvarez
Hi! I've been sending an unsubscribe mail, to this address: user-unsubscr...@spark.apache.org, for the last months, and still...I don't manage to unsubscribe...Why??? B r /Junior

Unsubscribe

2019-01-16 Thread Deepak Sahoo
Unsubscribe - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

How thriftserver load data

2019-01-16 Thread Soheil Pourbafrani
Hi, I want to write an application that load data from HDFS into tables and create a ThriftServer and submit it to the YARN cluster. The question is how Spark actually load data. Does Spark load data in the memory since the application started or it waits for query and just loads data according