Re: Dynamic metric names

2019-05-07 Thread Roberto Coluccio
It would be a dream to have an easy-to-use dynamic metric system AND a reliable counting system (accumulator-like) in Spark... Thanks Roberto On Tue, May 7, 2019 at 3:54 AM Saisai Shao wrote: > I think the main reason why that was not merged is that Spark itself > doesn't have such

userClassPathFirst=true prevents SparkContext to be initialized

2017-01-30 Thread Roberto Coluccio
Hello folks, I'm trying to work around an issue with some dependencies by trying to specify at spark-submit time that I want my (user) classpath to be resolved and taken into account first (against the jars received through the System Classpath, which is /data/cloudera/parcels/CDH/jars/). In

Re: [Spark 1.5+] ReceiverTracker seems not to stop Kinesis receivers

2016-02-23 Thread Roberto Coluccio
Any chance anyone gave a look at this? Thanks! On Wed, Feb 10, 2016 at 10:46 AM, Roberto Coluccio < roberto.coluc...@gmail.com> wrote: > Thanks Shixiong! > > I'm attaching the thread dumps (I printed the Spark UI after expanding all > the elements, hope that's fine) and re

Re: [Spark 1.5+] ReceiverTracker seems not to stop Kinesis receivers

2016-02-09 Thread Roberto Coluccio
Kinesis and open any Receivers. Thank you! Roberto On Tue, Feb 2, 2016 at 4:40 PM, Roberto Coluccio <roberto.coluc...@gmail.com > wrote: > Hi, > > I'm struggling around an issue ever since I tried to upgrade my Spark > Streaming solution from 1.4.1 to 1.5+. > > I have a S

[Spark 1.5+] ReceiverTracker seems not to stop Kinesis receivers

2016-02-02 Thread Roberto Coluccio
Hi, I'm struggling around an issue ever since I tried to upgrade my Spark Streaming solution from 1.4.1 to 1.5+. I have a Spark Streaming app which creates 3 ReceiverInputDStreams leveraging KinesisUtils.createStream API. I used to leverage a timeout to terminate my app

Spark 1.5.2 streaming driver in YARN cluster mode on Hadoop 2.6 (on EMR 4.2) restarts after stop

2016-01-14 Thread Roberto Coluccio
Hi there, I'm facing a weird issue when upgrading from Spark 1.4.1 streaming driver on EMR 3.9 (hence Hadoop 2.4.0) to Spark 1.5.2 on EMR 4.2 (hence Hadoop 2.6.0). Basically, the very same driver which used to terminate after a timeout as expected, now does not. In particular, as long as the

Spark Streaming - print accumulators value every period as logs

2015-12-24 Thread Roberto Coluccio
Hello, I have a batch and a streaming driver using same functions (Scala). I use accumulators (passed to functions constructors) to count stuff. In the batch driver, doing so in the right point of the pipeline, I'm able to retrieve the accumulator value and print it as log4j log. In the

Re: Spark on EMR: out-of-the-box solution for real-time application logs monitoring?

2015-12-11 Thread Roberto Coluccio
an automated fashion) for long-running processes like streaming driver and if are there out-of-the-box solutions. Thanks, Roberto On Thu, Dec 10, 2015 at 3:06 PM, Steve Loughran <ste...@hortonworks.com> wrote: > > > On 10 Dec 2015, at 14:52, Roberto Coluccio <roberto.colu

Spark on EMR: out-of-the-box solution for real-time application logs monitoring?

2015-12-10 Thread Roberto Coluccio
Hello, I'm investigating on a solution to real-time monitor Spark logs produced by my EMR cluster in order to collect statistics and trigger alarms. Being on EMR, I found the CloudWatch Logs + Lambda pretty straightforward and, since I'm on AWS, those service are pretty well integrated

Fwd: [Spark + Hive + EMR + S3] Issue when reading from Hive external table backed on S3 with large amount of small files

2015-08-07 Thread Roberto Coluccio
Please community, I'd really appreciate your opinion on this topic. Best regards, Roberto -- Forwarded message -- From: Roberto Coluccio roberto.coluc...@gmail.com Date: Sat, Jul 25, 2015 at 6:28 PM Subject: [Spark + Hive + EMR + S3] Issue when reading from Hive external table

[Spark + Hive + EMR + S3] Issue when reading from Hive external table backed on S3 with large amount of small files

2015-07-25 Thread Roberto Coluccio
Hello Spark community, I currently have a Spark 1.3.1 batch driver, deployed in YARN-cluster mode on an EMR cluster (AMI 3.7.0) that reads input data through an HiveContext, in particular SELECTing data from an EXTERNAL TABLE backed on S3. Such table has dynamic partitions and contains *hundreds

Spark 1.3.1 + Hive: write output to CSV with header on S3

2015-07-17 Thread Roberto Coluccio
Hello community, I'm currently using Spark 1.3.1 with Hive support for outputting processed data on an external Hive table backed on S3. I'm using a manual specification of the delimiter, but I'd want to know if is there any clean way to write in CSV format: *val* sparkConf = *new* SparkConf()

Re: Spark 1.4 RDD to DF fails with toDF()

2015-06-26 Thread Roberto Coluccio
I got a similar issue. Might your as well be related to this https://issues.apache.org/jira/browse/SPARK-8368 ? On Fri, Jun 26, 2015 at 2:00 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Those provided spark libraries are compatible with scala 2.11? Thanks Best Regards On Fri, Jun 26,

Re: java.lang.OutOfMemoryError: PermGen space

2015-06-25 Thread Roberto Coluccio
to cause this. It wouldn't work with anything less than 256m for a simple piece of code. 1.3.1 used to work with default(64m I think) Srikanth On Wed, Jun 24, 2015 at 12:47 PM, Roberto Coluccio roberto.coluc...@gmail.com wrote: Did you try to pass it with --driver-java-options -XX:MaxPermSize

Re: java.lang.OutOfMemoryError: PermGen space

2015-06-24 Thread Roberto Coluccio
Did you try to pass it with --driver-java-options -XX:MaxPermSize=256m as spark-shell input argument? Roberto On Wed, Jun 24, 2015 at 5:57 PM, stati srikanth...@gmail.com wrote: Hello, I moved from 1.3.1 to 1.4.0 and started receiving java.lang.OutOfMemoryError: PermGen space when I

Re: [Spark 1.3.1 on YARN on EMR] Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

2015-06-20 Thread Roberto Coluccio
I confirm, Christopher was very kind helping me out here. The solution presented in the linked doc worked perfectly. IMO it should be linked in the official Spark documentation. Thanks again, Roberto On 20 Jun 2015, at 19:25, Bozeman, Christopher bozem...@amazon.com wrote: We worked it

[Spark 1.3.1 on YARN on EMR] Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

2015-06-10 Thread Roberto Coluccio
Hi! I'm struggling with an issue with Spark 1.3.1 running on YARN, running on an AWS EMR cluster. Such cluster is based on AMI 3.7.0 (hence Amazon Linux 2015.03, Hive 0.13 already installed and configured on the cluster, Hadoop 2.4, etc...). I make use of the AWS emr-bootstrap-action

Spark SQL weird exception after upgrading from 1.1.1 to 1.2.x

2015-03-18 Thread Roberto Coluccio
Hi everybody, When trying to upgrade from Spark 1.1.1 to Spark 1.2.x (tried both 1.2.0 and 1.2.1) I encounter a weird error never occurred before about which I'd kindly ask for any possible help. In particular, all my Spark SQL queries fail with the following exception:

Re: Spark SQL weird exception after upgrading from 1.1.1 to 1.2.x

2015-03-18 Thread Roberto Coluccio
, it depends on the actual contents of your query. Yin had opened a PR for this, although not merged yet, it should be a valid fix https://github.com/apache/spark/pull/5078 This fix will be included in 1.3.1. Cheng On 3/18/15 10:04 PM, Roberto Coluccio wrote: Hi Cheng, thanks for your

Re: Spark SQL weird exception after upgrading from 1.1.1 to 1.2.x

2015-03-18 Thread Roberto Coluccio
anyone who will help me out here. Roberto On Wed, Mar 18, 2015 at 12:09 PM, Cheng Lian lian.cs@gmail.com wrote: Would you mind to provide the query? If it's confidential, could you please help constructing a query that reproduces this issue? Cheng On 3/18/15 6:03 PM, Roberto Coluccio

Re: Spark SQL weird exception after upgrading from 1.1.1 to 1.2.x

2015-03-18 Thread Roberto Coluccio
Hey Cheng, thank you so much for your suggestion, the problem was actually a column/field called timestamp in one of the case classes!! Once I changed its name everything worked out fine again. Let me say it was kinda frustrating ... Roberto On Wed, Mar 18, 2015 at 4:07 PM, Roberto Coluccio

Spark UI port issue when deploying Spark driver on YARN in yarn-cluster mode on EMR

2014-12-23 Thread Roberto Coluccio
Hello folks, I'm trying to deploy a Spark driver on Amazon EMR in yarn-cluster mode expecting to be able to access the Spark UI from the spark-master-ip:4040 address (default port). The problem here is that the Spark UI port is always defined randomly at runtime, although I also tried to specify

Access resources from jar-local resources folder

2014-09-23 Thread Roberto Coluccio
Hello folks, I have a Spark Streaming application built with Maven (as jar) and deployed with the spark-submit script. The application project has the following (main) structure: myApp src main scala com.mycompany.package MyApp.scala DoSomething.scala ... resources aPerlScript.pl ...