Answering my own question. Looks like this can be done by implementing
SparkListener with method
def onExecutorAdded(executorAdded: SparkListenerExecutorAdded): Unit
as the SparkListenerExecutorAdded object has the info.
--
Nick
Am using
Hi,
Am using Spark 2.3 and looking for an API in Java to fetch the list of
executors. Need host and Id info for the executors.
Thanks for any pointers,
--
Nick
-
To unsubscribe e-mail:
Hi,
I think I'm seeing a bug in the context of upgrading to using the Kafka 0.10
streaming API. Code fragments follow.
--
Nick
JavaInputDStream> rawStream =
getDirectKafkaStream();
JavaDStream> messagesTuple =
Hello,
Looks like the API docs linked from the Spark Kafka 0.10 Integration page are
not current.
For instance, on the page
https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html
the code examples show the new API (i.e. class ConsumerStrategies). However,
Hi,
We got the following exception trying to initiate a REST call from the Spark
app.
This is running Spark 1.5.2 in AWS / Yarn. Its only happened one time during
the course of a streaming app
that has been running for months.
Just curious if anyone could shed some more light on root
tected String generateFileNameForKeyValue(String key, String value,
String name) {
return key;
}
@Override
protected String generateActualKey(String key, String value) {
return "";
}
}
____
From: Afshartous, Nick <nafshart..
lelize(Arrays.asList(strings))
.mapToPair(pairFunction)
.saveAsHadoopFile("s3://...", String.class, String.class,
RDDMultipleTextOutputFormat.class);
From: Nicholas Chammas <nicholas.cham...@gmail.com>
Sent: Wednesday, May 4, 2016
Hi,
Is there any way to write out to S3 the values of a f key-value Pair RDD ?
I'd like each value of a pair to be written to its own file where the file name
corresponds to the key name.
Thanks,
--
Nick
Hi,
After calling RDD.persist(), is then possible to come back later and access the
persisted RDD.
Let's say for instance coming back and starting a new Spark shell session. How
would one access the persisted RDD in the new shell session ?
Thanks,
--
Nick
Hi,
On AWS EMR 4.2 / Spark 1.5.2, I tried the example here
https://spark.apache.org/docs/1.5.0/sql-programming-guide.html#hive-tables
to load data from a file into a Hive table.
scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala> sqlContext.sql("CREATE
Hello,
We have a streaming job that consistently fails with the trace below. This is
on an AWS EMR 4.2/Spark 1.5.2 cluster.
This ticket looks related
SPARK-8112 Received block event count through the StreamingListener can be
negative
although it appears to have been fixed in 1.5.
This seems to be a problem with Kafka brokers being in a bad state. We're
restarting Kafka to resolve.
--
Nick
From: Ted Yu <yuzhih...@gmail.com>
Sent: Friday, January 22, 2016 10:38 AM
To: Afshartous, Nick
Cc: user@spark.apache.org
Subject: Re:
Hi,
In an AWS EMR/Spark 1.5 cluster we're launching a streaming job from the driver
node. Would it make any sense in this case to use cluster mode ? More
specifically would there be any benefit that YARN would provide when using
cluster but not client mode ?
Thanks,
--
Nick
be isolated.
--
Nick
From: Cody Koeninger <c...@koeninger.org>
Sent: Friday, January 15, 2016 11:46 PM
To: Afshartous, Nick
Cc: user@spark.apache.org
Subject: Re: Consuming commands from a queue
Reading commands from kafka and triggering a redshif
Hi,
We have a streaming job that consumes from Kafka and outputs to S3. We're
going to have the job also send commands (to copy from S3 to Redshift) into a
different Kafka topic.
What would be the best framework for consuming and processing the copy commands
? We're considering creating
Hi,
Am trying to configure log4j on an AWS EMR 4.2 Spark cluster for a streaming
job set in client mode.
I changed
/etc/spark/conf/log4j.properties
to use a FileAppender. However the INFO logging still goes to console.
Thanks for any suggestions,
--
Nick
>From the console:
Found the issue, a conflict between setting Java options in both
spark-defaults.conf and in the spark-submit.
--
Nick
From: Afshartous, Nick <nafshart...@turbine.com>
Sent: Friday, December 18, 2015 11:46 AM
To: user@spark.apache.org
Subject: Confi
Hi,
I'm trying to run a streaming job on a single node EMR 4.1/Spark 1.5 cluster.
Its throwing an IllegalArgumentException right away on the submit.
Attaching full output from console.
Thanks for any insights.
--
Nick
15/12/11 16:44:43 WARN util.NativeCodeLoader: Unable to load
efault in
conf/spark-default.conf), do you have all properties defined, especially
spark.yarn.keytab ?
Thanks,
Regards
JB
On 12/11/2015 05:49 PM, Afshartous, Nick wrote:
>
> Hi,
>
>
> I'm trying to run a streaming job on a single node EMR 4.1/Spark 1.5
> cluster. Its throwing an Illega
Hi,
On Spark 1.5 on EMR 4.1 the message below appears in stderr in the Yarn UI.
ERROR StatusLogger No log4j2 configuration file found. Using default
configuration: logging only errors to the console.
I do see that there is
/usr/lib/spark/conf/log4j.properties
Can someone please advise
nks for the info, though this is a single-node cluster so that can't be the
cause of the error (which is in the driver log).
--
Nick
From: Jonathan Kelly [jonathaka...@gmail.com]
Sent: Thursday, November 19, 2015 6:45 PM
To: Afshartous, Nick
Cc: u
Hi, we are load testing our Spark 1.3 streaming (reading from Kafka) job and
seeing a problem. This is running in AWS/Yarn and the streaming batch interval
is set to 3 minutes and this is a ten node cluster.
Testing at 30,000 events per second we are seeing the streaming job get stuck
ticsId = sqlContext.sql("select * from ad_info where
deviceId = '%1s' order by messageTime desc limit 1" format (deviceId))
withoutAnalyticsId.take(1)(0)
}
})
From: Michael Armbrust [mich...@databricks.com]
Sent: Thursday, October
23 matches
Mail list logo