Re: ??? INFO CreateViewCommand:57 - Try to uncache `rawCounts` before replacing.

2021-12-21 Thread Jun Zhu
awCounts" ) ? I > expected to manage spark to manage the cache automatically given that I do > not explicitly call cache(). > > > > > > How come I do not get a similar warning from? > > sampleSDF.createOrReplaceTempView( "sample" ) &

Re: Spark batch job chaining

2020-08-09 Thread Jun Zhu
Hi I am using Airflow in such scenario

Alternative for spark-redshift on scala 2.12

2020-05-05 Thread Jun Zhu
Hello Users, Is there any alternative for https://github.com/databricks/spark-redshift on scala 2.12.x? Thanks -- [image: vshapesaqua11553186012.gif] <https://vungle.com/> *Jun Zhu* Sr. Engineer I, Data +86 18565739171 [image: in1552694272.png] <https://www.linkedin.com/compa

Re: Spark Thriftserver on yarn, sql submit take long time.

2019-06-04 Thread Jun Zhu
://ip-172-19-104-48.ec2.internal:9083 > 19/06/04 05:58:18 INFO HiveMetaStoreClient: Opened a connection to > metastore, current connections: 1 > 19/06/04 05:58:18 INFO HiveMetaStoreClient: Connected to metastore. > 19/06/04 05:58:18 INFO RetryingMetaStoreClient: RetryingMetaStoreClient &

Spark Thriftserver on yarn, sql submit take long time.

2019-06-04 Thread Jun Zhu
(1), None)], false, > false, false > *19/06/04 05:50:15* INFO SparkExecuteStatementOperation: Result Schema: > StructType(StructField(plan,StringType,true)) Had set thrift server miniresource(10 instance) and initresource(10) on yarn. Any thought? Any config issue may relate

Re: Different query result between spark thrift server and spark-shell

2019-04-25 Thread Jun Zhu
Never mind, I got the point, spark replace hive parquet with it's own, Should set spark.sql.hive.convertMetastoreParquet=false to use hive's. Thanks On Thu, Apr 25, 2019 at 5:00 PM Jun Zhu wrote: > Hi, > We are using plugins from apache hudi which self defined a hive external &

Different query result between spark thrift server and spark-shell

2019-04-25 Thread Jun Zhu
'com.uber.hoodie.hadoop.HoodieInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://vungle2-dataeng/jun-test/stage20190424new' It works when query in spark-shell, however not in spark thrift server with same config, After debug found: spark-shell execution plan differ from

Fwd: Dose pyspark supports python3.6?

2017-11-01 Thread Jun Shi
. Thank you very much! Best, Jun

how to design the Spark application so that Shuffle data will be automatically cleaned up after some iterations

2015-09-05 Thread Jun Li
an prevent the ever-increasing of the shuffle data storage for computation that takes many iterations? Jun

Re:Re: Real-time data visualization with Zeppelin

2015-08-06 Thread jun
Hi andy, Is there any method to convert ipython notebook file(.ipynb) to spark notebook file(.snb) or vice versa? BR Jun At 2015-07-13 02:45:57, andy petrella andy.petre...@gmail.com wrote: Heya, You might be looking for something like this I guess: https://www.youtube.com/watch?v

Re: Question about Spark Streaming Receiver Failure

2015-03-16 Thread Jun Yang
of operations, then there will be a lot of shuffle data. So You need to check in the worker logs and see what happened (whether DISK full etc.), We have streaming pipelines running for weeks without having any issues. Thanks Best Regards On Mon, Mar 16, 2015 at 12:40 PM, Jun Yang yangjun...@gmail.com

Question about Spark Streaming Receiver Failure

2015-03-16 Thread Jun Yang
Guys, We have a project which builds upon Spark streaming. We use Kafka as the input stream, and create 5 receivers. When this application runs for around 90 hour, all the 5 receivers failed for some unknown reasons. In my understanding, it is not guaranteed that Spark streaming receiver will

Re: Question about Spark Streaming Receiver Failure

2015-03-16 Thread Jun Yang
On Mon, Mar 16, 2015 at 12:40 PM, Jun Yang yangjun...@gmail.com wrote: Guys, We have a project which builds upon Spark streaming. We use Kafka as the input stream, and create 5 receivers. When this application runs for around 90 hour, all the 5 receivers failed for some unknown reasons

Re: Question about Spark Streaming Receiver Failure

2015-03-16 Thread Jun Yang
spawn another receiver on another machine or on the same machine. Thanks Best Regards On Mon, Mar 16, 2015 at 1:08 PM, Jun Yang yangjun...@gmail.com wrote: Dibyendu, Thanks for the reply. I am reading your project homepage now. One quick question I care about is: If the receivers

Is It Feasible for Spark 1.1 Broadcast to Fully Utilize the Ethernet Card Throughput?

2015-01-09 Thread Jun Yang
Guys, I have a question regarding to Spark 1.1 broadcast implementation. In our pipeline, we have a large multi-class LR model, which is about 1GiB size. To employ the benefit of Spark parallelism, a natural thinking is to broadcast this model file to the worker node. However, it looks that

KafkaReceiver executor in spark streaming job on YARN suddenly killed by ResourceManager

2015-01-02 Thread Jun Ki Kim
Hi, guys I tried to run job of spark streaming with kafka on YARN. My business logic is very simple. Just listen on kafka topic and write dstream to hdfs on each batch iteration. After launching streaming job few hours, it works well. However suddenly died by ResourceManager. ResourceManager

Re: k-means clustering

2014-11-20 Thread Jun Yang
Guys, As to the questions of pre-processing, you could just migrate your logic to Spark before using K-means. I only used Scala on Spark, and haven't used Python binding on Spark, but I think the basic steps must be the same. BTW, if your data set is big with huge sparse dimension feature

Questions Regarding to MPI Program Migration to Spark

2014-11-16 Thread Jun Yang
Guys, Recently we are migrating our backend pipeline from to Spark. In our pipeline, we have a MPI-based HAC implementation, to ensure the result consistency of migration, we also want to migrate this MPI-implemented code to Spark. However, during the migration process, I found that there are

unsubscribe

2014-05-04 Thread ZHANG Jun
原始邮件 主题:unsubscribe 发件人:Nabeel Memon nm3...@gmail.com 收件人:user@spark.apache.org 抄送: unsubscribe