Re: Getting List of Executor Id's

2019-05-13 Thread Afshartous, Nick
Answering my own question. Looks like this can be done by implementing SparkListener with method def onExecutorAdded(executorAdded: SparkListenerExecutorAdded): Unit as the SparkListenerExecutorAdded object has the info. -- Nick Am using

Getting List of Executor Id's

2019-05-13 Thread Afshartous, Nick
Hi, Am using Spark 2.3 and looking for an API in Java to fetch the list of executors. Need host and Id info for the executors. Thanks for any pointers, -- Nick - To unsubscribe e-mail:

[ Spark Streaming & Kafka 0.10 ] Possible bug

2017-03-22 Thread Afshartous, Nick
Hi, I think I'm seeing a bug in the context of upgrading to using the Kafka 0.10 streaming API. Code fragments follow. -- Nick JavaInputDStream> rawStream = getDirectKafkaStream(); JavaDStream> messagesTuple =

[Spark Kafka] API Doc pages for Kafka 0.10 not current

2017-02-27 Thread Afshartous, Nick
Hello, Looks like the API docs linked from the Spark Kafka 0.10 Integration page are not current. For instance, on the page https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html the code examples show the new API (i.e. class ConsumerStrategies). However,

Error making REST call from streaming app

2016-05-23 Thread Afshartous, Nick
Hi, We got the following exception trying to initiate a REST call from the Spark app. This is running Spark 1.5.2 in AWS / Yarn. Its only happened one time during the course of a streaming app that has been running for months. Just curious if anyone could shed some more light on root

Re: Writing output of key-value Pair RDD

2016-05-05 Thread Afshartous, Nick
tected String generateFileNameForKeyValue(String key, String value, String name) { return key; } @Override protected String generateActualKey(String key, String value) { return ""; } } ____ From: Afshartous, Nick <nafshart..

Re: Writing output of key-value Pair RDD

2016-05-05 Thread Afshartous, Nick
lelize(Arrays.asList(strings)) .mapToPair(pairFunction) .saveAsHadoopFile("s3://...", String.class, String.class, RDDMultipleTextOutputFormat.class); From: Nicholas Chammas <nicholas.cham...@gmail.com> Sent: Wednesday, May 4, 2016

Writing output of key-value Pair RDD

2016-05-04 Thread Afshartous, Nick
Hi, Is there any way to write out to S3 the values of a f key-value Pair RDD ? I'd like each value of a pair to be written to its own file where the file name corresponds to the key name. Thanks, -- Nick

Reading Back a Cached RDD

2016-03-24 Thread Afshartous, Nick
Hi, After calling RDD.persist(), is then possible to come back later and access the persisted RDD. Let's say for instance coming back and starting a new Spark shell session. How would one access the persisted RDD in the new shell session ? Thanks, -- Nick

Using Spark SQL / Hive on AWS EMR

2016-03-03 Thread Afshartous, Nick
Hi, On AWS EMR 4.2 / Spark 1.5.2, I tried the example here https://spark.apache.org/docs/1.5.0/sql-programming-guide.html#hive-tables to load data from a file into a Hive table. scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) scala> sqlContext.sql("CREATE

Spark Streaming : requirement failed: numRecords must not be negative

2016-01-22 Thread Afshartous, Nick
Hello, We have a streaming job that consistently fails with the trace below. This is on an AWS EMR 4.2/Spark 1.5.2 cluster. This ticket looks related SPARK-8112 Received block event count through the StreamingListener can be negative although it appears to have been fixed in 1.5.

Re: Spark Streaming : requirement failed: numRecords must not be negative

2016-01-22 Thread Afshartous, Nick
This seems to be a problem with Kafka brokers being in a bad state. We're restarting Kafka to resolve. -- Nick From: Ted Yu <yuzhih...@gmail.com> Sent: Friday, January 22, 2016 10:38 AM To: Afshartous, Nick Cc: user@spark.apache.org Subject: Re:

Client versus cluster mode

2016-01-21 Thread Afshartous, Nick
Hi, In an AWS EMR/Spark 1.5 cluster we're launching a streaming job from the driver node. Would it make any sense in this case to use cluster mode ? More specifically would there be any benefit that YARN would provide when using cluster but not client mode ? Thanks, -- Nick

Re: Consuming commands from a queue

2016-01-16 Thread Afshartous, Nick
be isolated. -- Nick From: Cody Koeninger <c...@koeninger.org> Sent: Friday, January 15, 2016 11:46 PM To: Afshartous, Nick Cc: user@spark.apache.org Subject: Re: Consuming commands from a queue Reading commands from kafka and triggering a redshif

Consuming commands from a queue

2016-01-15 Thread Afshartous, Nick
Hi, We have a streaming job that consumes from Kafka and outputs to S3. We're going to have the job also send commands (to copy from S3 to Redshift) into a different Kafka topic. What would be the best framework for consuming and processing the copy commands ? We're considering creating

Configuring log4j

2015-12-18 Thread Afshartous, Nick
Hi, Am trying to configure log4j on an AWS EMR 4.2 Spark cluster for a streaming job set in client mode. I changed /etc/spark/conf/log4j.properties to use a FileAppender. However the INFO logging still goes to console. Thanks for any suggestions, -- Nick >From the console:

Re: Configuring log4j

2015-12-18 Thread Afshartous, Nick
Found the issue, a conflict between setting Java options in both spark-defaults.conf and in the spark-submit. -- Nick From: Afshartous, Nick <nafshart...@turbine.com> Sent: Friday, December 18, 2015 11:46 AM To: user@spark.apache.org Subject: Confi

Spark Submit - java.lang.IllegalArgumentException: requirement failed

2015-12-11 Thread Afshartous, Nick
Hi, I'm trying to run a streaming job on a single node EMR 4.1/Spark 1.5 cluster. Its throwing an IllegalArgumentException right away on the submit. Attaching full output from console. Thanks for any insights. -- Nick 15/12/11 16:44:43 WARN util.NativeCodeLoader: Unable to load

Re: Spark Submit - java.lang.IllegalArgumentException: requirement failed

2015-12-11 Thread Afshartous, Nick
efault in conf/spark-default.conf), do you have all properties defined, especially spark.yarn.keytab ? Thanks, Regards JB On 12/11/2015 05:49 PM, Afshartous, Nick wrote: > > Hi, > > > I'm trying to run a streaming job on a single node EMR 4.1/Spark 1.5 > cluster. Its throwing an Illega

Configuring Log4J (Spark 1.5 on EMR 4.1)

2015-11-19 Thread Afshartous, Nick
Hi, On Spark 1.5 on EMR 4.1 the message below appears in stderr in the Yarn UI. ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. I do see that there is /usr/lib/spark/conf/log4j.properties Can someone please advise

RE: Configuring Log4J (Spark 1.5 on EMR 4.1)

2015-11-19 Thread Afshartous, Nick
nks for the info, though this is a single-node cluster so that can't be the cause of the error (which is in the driver log). -- Nick From: Jonathan Kelly [jonathaka...@gmail.com] Sent: Thursday, November 19, 2015 6:45 PM To: Afshartous, Nick Cc: u

Spark/Kafka Streaming Job Gets Stuck

2015-10-28 Thread Afshartous, Nick
Hi, we are load testing our Spark 1.3 streaming (reading from Kafka) job and seeing a problem. This is running in AWS/Yarn and the streaming batch interval is set to 3 minutes and this is a ten node cluster. Testing at 30,000 events per second we are seeing the streaming job get stuck

RE: Using Sqark SQL mapping over an RDD

2015-10-08 Thread Afshartous, Nick
ticsId = sqlContext.sql("select * from ad_info where deviceId = '%1s' order by messageTime desc limit 1" format (deviceId)) withoutAnalyticsId.take(1)(0) } }) From: Michael Armbrust [mich...@databricks.com] Sent: Thursday, October