okay what is difference between keep set hive.execution.engine =spark
and
running the script through hivecontext.sql
Show quoted text
On Mar 9, 2017 8:52 AM, "ayan guha" wrote:
> Hi
>
> Subject to your version of Hive & Spark, you may want to set
>
code:
directKafkaStream.foreachRDD(rdd ->
{
rdd.foreach(record ->
{
messages1.add(record._2);
});
JavaRDD lines = sc.parallelize(messages1);
Hi
Subject to your version of Hive & Spark, you may want to set
hive.execution.engine=spark as beeline command line parameter, assuming you
are running hive scripts using beeline command line (which is suggested
practice for security purposes).
On Thu, Mar 9, 2017 at 2:09 PM, nancy henry
Hi Team,
basically we have all data as hive tables ..and processing it till now in
hive on MR.. now that we have hivecontext which can run hivequeries on
spark, we are making all these complex hive scripts to run using
hivecontext.sql(sc.textfile(hivescript)) kind of approach ie basically
running
Hi Team,
basically we have all data as hive tables ..and processing it till now in
hive on MR.. now that we have hivecontext which can run hivequeries on
spark, we are making all these complex hive scripts to run using
hivecontext.sql(sc.textfile(hivescript)) kind of approach ie basically
running
code:
directKafkaStream.foreachRDD(rdd ->
{
rdd.foreach(record ->
{
messages1.add(record._2);
});
JavaRDD lines = sc.parallelize(messages1);
IIUC, your scenario is quite like what currently ReliableKafkaReceiver
does. You can only send ack to the upstream source after WAL is persistent,
otherwise because of asynchronization of data processing and data
receiving, there's still a chance data could be lost if you send out ack
before WAL.
Thanks for the feedback everyone. We've had a look at different SQL based
solutions, and have got good performance out of them, but some of the
reports we make can't be generated with a single bit of SQL. This is just
an investigation to see if Spark is a viable alternative.
I've got another
Thank you liu. Can you please explain what do you mean by enabling spark
fault tolerant mechanism?
I observed that after all tasks finishes, spark is working on concatenating
same partitions from all tasks on file system. eg,
task1 - partition1, partition2, partition3
task2 - partition1,
I was talking about the Kafka binary if using to run the Kafka server
(broker) with. The version for that binary is kafka_2.10-0.8.2.1 and Spark
is 2.0.2 is built with 2.11. So I am using the Kafka Connector that Spark
is using internally to communicate with the broker is also built with Scala
Hello,
I'm running JavaRDD.count() repeteadly on a small RDD, and it seems to
increase the size of the Java heap over time until the default limit
is reached and an OutOfMemoryException is thrown. I'd expect this
program to run in constant space, and the problem carries over to some
more
Hi All,
I am using a Receiver based approach. And I understand that spark streaming
API's will convert the received data from receiver into blocks and these
blocks that are in memory are also stored in WAL if one enables it. my
upstream source which is not Kafka can also replay by which I mean if
OK, I found the problem. There is a typo in my configuration. As a result,
the executor dynamic allocation is not disabled. So, the executors get
killed and requested from time to time. All good now.
On Wed, Mar 8, 2017 at 2:45 PM, TheGeorge1918 .
wrote:
> Hello all,
>
Hello all,
I was running some spark job and some executors failed without error info.
The executors were dead and new executors were requested but on the spark
web UI, no failure found. Normally, if it's memory issue, I could find OOM
ther, but not this time.
Configuration:
1. each executor has
Hi,
I'm trying to read a s3 bucket from Spark and up until today Spark always
complain that the request return 403
hadoopConf = spark_context._jsc.hadoopConfiguration()
hadoopConf.set("fs.s3a.access.key", "ACCESSKEY")
hadoopConf.set("fs.s3a.secret.key", "SECRETKEY")
My guess is that the UI serialization times show the Java side only. To get
a feeling for the python pickling/unpickling, use the show_profiles()
method of the SparkContext instance: http://spark.apache.
org/docs/latest/api/python/pyspark.html#pyspark.SparkContext.show_profiles
That will show you
16 matches
Mail list logo