awCounts" ) ? I
> expected to manage spark to manage the cache automatically given that I do
> not explicitly call cache().
>
>
>
>
>
> How come I do not get a similar warning from?
>
> sampleSDF.createOrReplaceTempView( "sample" )
&
Hi
I am using Airflow in such scenario
Hello Users,
Is there any alternative for https://github.com/databricks/spark-redshift on
scala 2.12.x?
Thanks
--
[image: vshapesaqua11553186012.gif] <https://vungle.com/> *Jun Zhu*
Sr. Engineer I, Data
+86 18565739171
[image: in1552694272.png] <https://www.linkedin.com/compa
://ip-172-19-104-48.ec2.internal:9083
> 19/06/04 05:58:18 INFO HiveMetaStoreClient: Opened a connection to
> metastore, current connections: 1
> 19/06/04 05:58:18 INFO HiveMetaStoreClient: Connected to metastore.
> 19/06/04 05:58:18 INFO RetryingMetaStoreClient: RetryingMetaStoreClient
&
(1), None)], false,
> false, false
> *19/06/04 05:50:15* INFO SparkExecuteStatementOperation: Result Schema:
> StructType(StructField(plan,StringType,true))
Had set thrift server miniresource(10 instance) and initresource(10) on
yarn.
Any thought? Any config issue may relate
Never mind, I got the point, spark replace hive parquet with it's own,
Should set spark.sql.hive.convertMetastoreParquet=false to use hive's.
Thanks
On Thu, Apr 25, 2019 at 5:00 PM Jun Zhu wrote:
> Hi,
> We are using plugins from apache hudi which self defined a hive external
&
'com.uber.hoodie.hadoop.HoodieInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 's3a://vungle2-dataeng/jun-test/stage20190424new'
It works when query in spark-shell, however not in spark thrift server with
same config,
After debug found:
spark-shell execution plan differ from
. Thank you very much!
Best,
Jun
an prevent the ever-increasing of the shuffle data
storage for computation that takes many iterations?
Jun
Hi andy,
Is there any method to convert ipython notebook file(.ipynb) to spark notebook
file(.snb) or vice versa?
BR
Jun
At 2015-07-13 02:45:57, andy petrella andy.petre...@gmail.com wrote:
Heya,
You might be looking for something like this I guess:
https://www.youtube.com/watch?v
of operations, then there will be a lot of shuffle data. So You need
to check in the worker logs and see what happened (whether DISK full etc.),
We have streaming pipelines running for weeks without having any issues.
Thanks
Best Regards
On Mon, Mar 16, 2015 at 12:40 PM, Jun Yang yangjun...@gmail.com
Guys,
We have a project which builds upon Spark streaming.
We use Kafka as the input stream, and create 5 receivers.
When this application runs for around 90 hour, all the 5 receivers failed
for some unknown reasons.
In my understanding, it is not guaranteed that Spark streaming receiver
will
On Mon, Mar 16, 2015 at 12:40 PM, Jun Yang yangjun...@gmail.com wrote:
Guys,
We have a project which builds upon Spark streaming.
We use Kafka as the input stream, and create 5 receivers.
When this application runs for around 90 hour, all the 5 receivers failed
for some unknown reasons
spawn another receiver on another machine or on the same machine.
Thanks
Best Regards
On Mon, Mar 16, 2015 at 1:08 PM, Jun Yang yangjun...@gmail.com wrote:
Dibyendu,
Thanks for the reply.
I am reading your project homepage now.
One quick question I care about is:
If the receivers
Guys,
I have a question regarding to Spark 1.1 broadcast implementation.
In our pipeline, we have a large multi-class LR model, which is about 1GiB
size.
To employ the benefit of Spark parallelism, a natural thinking is to
broadcast this model file to the worker node.
However, it looks that
Hi, guys
I tried to run job of spark streaming with kafka on YARN.
My business logic is very simple.
Just listen on kafka topic and write dstream to hdfs on each batch iteration.
After launching streaming job few hours, it works well. However suddenly died
by ResourceManager.
ResourceManager
Guys,
As to the questions of pre-processing, you could just migrate your logic to
Spark before using K-means.
I only used Scala on Spark, and haven't used Python binding on Spark, but I
think the basic steps must be the same.
BTW, if your data set is big with huge sparse dimension feature
Guys,
Recently we are migrating our backend pipeline from to Spark.
In our pipeline, we have a MPI-based HAC implementation, to ensure the
result consistency of migration, we also want to migrate this
MPI-implemented code to Spark.
However, during the migration process, I found that there are
原始邮件
主题:unsubscribe
发件人:Nabeel Memon nm3...@gmail.com
收件人:user@spark.apache.org
抄送:
unsubscribe
19 matches
Mail list logo