It would be a dream to have an easy-to-use dynamic metric system AND a
reliable counting system (accumulator-like) in Spark...
Thanks
Roberto
On Tue, May 7, 2019 at 3:54 AM Saisai Shao wrote:
> I think the main reason why that was not merged is that Spark itself
> doesn't have such
Hello folks,
I'm trying to work around an issue with some dependencies by trying to
specify at spark-submit time that I want my (user) classpath to be resolved
and taken into account first (against the jars received through the System
Classpath, which is /data/cloudera/parcels/CDH/jars/).
In
Any chance anyone gave a look at this?
Thanks!
On Wed, Feb 10, 2016 at 10:46 AM, Roberto Coluccio <
roberto.coluc...@gmail.com> wrote:
> Thanks Shixiong!
>
> I'm attaching the thread dumps (I printed the Spark UI after expanding all
> the elements, hope that's fine) and re
Kinesis and open any Receivers.
Thank you!
Roberto
On Tue, Feb 2, 2016 at 4:40 PM, Roberto Coluccio <roberto.coluc...@gmail.com
> wrote:
> Hi,
>
> I'm struggling around an issue ever since I tried to upgrade my Spark
> Streaming solution from 1.4.1 to 1.5+.
>
> I have a S
Hi,
I'm struggling around an issue ever since I tried to upgrade my Spark
Streaming solution from 1.4.1 to 1.5+.
I have a Spark Streaming app which creates 3 ReceiverInputDStreams
leveraging KinesisUtils.createStream API.
I used to leverage a timeout to terminate my app
Hi there,
I'm facing a weird issue when upgrading from Spark 1.4.1 streaming driver
on EMR 3.9 (hence Hadoop 2.4.0) to Spark 1.5.2 on EMR 4.2 (hence Hadoop
2.6.0).
Basically, the very same driver which used to terminate after a timeout as
expected, now does not. In particular, as long as the
Hello,
I have a batch and a streaming driver using same functions (Scala). I use
accumulators (passed to functions constructors) to count stuff.
In the batch driver, doing so in the right point of the pipeline, I'm able
to retrieve the accumulator value and print it as log4j log.
In the
an automated fashion) for long-running processes like
streaming driver and if are there out-of-the-box solutions.
Thanks,
Roberto
On Thu, Dec 10, 2015 at 3:06 PM, Steve Loughran <ste...@hortonworks.com>
wrote:
>
> > On 10 Dec 2015, at 14:52, Roberto Coluccio <roberto.colu
Hello,
I'm investigating on a solution to real-time monitor Spark logs produced by
my EMR cluster in order to collect statistics and trigger alarms. Being on
EMR, I found the CloudWatch Logs + Lambda pretty straightforward and, since
I'm on AWS, those service are pretty well integrated
Please community, I'd really appreciate your opinion on this topic.
Best regards,
Roberto
-- Forwarded message --
From: Roberto Coluccio roberto.coluc...@gmail.com
Date: Sat, Jul 25, 2015 at 6:28 PM
Subject: [Spark + Hive + EMR + S3] Issue when reading from Hive external
table
Hello Spark community,
I currently have a Spark 1.3.1 batch driver, deployed in YARN-cluster mode
on an EMR cluster (AMI 3.7.0) that reads input data through an HiveContext,
in particular SELECTing data from an EXTERNAL TABLE backed on S3. Such
table has dynamic partitions and contains *hundreds
Hello community,
I'm currently using Spark 1.3.1 with Hive support for outputting processed
data on an external Hive table backed on S3. I'm using a manual
specification of the delimiter, but I'd want to know if is there any
clean way to write in CSV format:
*val* sparkConf = *new* SparkConf()
I got a similar issue. Might your as well be related to this
https://issues.apache.org/jira/browse/SPARK-8368 ?
On Fri, Jun 26, 2015 at 2:00 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Those provided spark libraries are compatible with scala 2.11?
Thanks
Best Regards
On Fri, Jun 26,
to cause this. It wouldn't work with anything
less than 256m for a simple piece of code.
1.3.1 used to work with default(64m I think)
Srikanth
On Wed, Jun 24, 2015 at 12:47 PM, Roberto Coluccio
roberto.coluc...@gmail.com wrote:
Did you try to pass it with
--driver-java-options -XX:MaxPermSize
Did you try to pass it with
--driver-java-options -XX:MaxPermSize=256m
as spark-shell input argument?
Roberto
On Wed, Jun 24, 2015 at 5:57 PM, stati srikanth...@gmail.com wrote:
Hello,
I moved from 1.3.1 to 1.4.0 and started receiving
java.lang.OutOfMemoryError: PermGen space when I
I confirm,
Christopher was very kind helping me out here. The solution presented in the
linked doc worked perfectly. IMO it should be linked in the official Spark
documentation.
Thanks again,
Roberto
On 20 Jun 2015, at 19:25, Bozeman, Christopher bozem...@amazon.com wrote:
We worked it
Hi!
I'm struggling with an issue with Spark 1.3.1 running on YARN, running on
an AWS EMR cluster. Such cluster is based on AMI 3.7.0 (hence Amazon Linux
2015.03, Hive 0.13 already installed and configured on the cluster, Hadoop
2.4, etc...). I make use of the AWS emr-bootstrap-action
Hi everybody,
When trying to upgrade from Spark 1.1.1 to Spark 1.2.x (tried both 1.2.0
and 1.2.1) I encounter a weird error never occurred before about which I'd
kindly ask for any possible help.
In particular, all my Spark SQL queries fail with the following exception:
, it depends on the
actual contents of your query.
Yin had opened a PR for this, although not merged yet, it should be a
valid fix https://github.com/apache/spark/pull/5078
This fix will be included in 1.3.1.
Cheng
On 3/18/15 10:04 PM, Roberto Coluccio wrote:
Hi Cheng, thanks for your
anyone who will help me out
here.
Roberto
On Wed, Mar 18, 2015 at 12:09 PM, Cheng Lian lian.cs@gmail.com wrote:
Would you mind to provide the query? If it's confidential, could you
please help constructing a query that reproduces this issue?
Cheng
On 3/18/15 6:03 PM, Roberto Coluccio
Hey Cheng, thank you so much for your suggestion, the problem was actually
a column/field called timestamp in one of the case classes!! Once I
changed its name everything worked out fine again. Let me say it was kinda
frustrating ...
Roberto
On Wed, Mar 18, 2015 at 4:07 PM, Roberto Coluccio
Hello folks,
I'm trying to deploy a Spark driver on Amazon EMR in yarn-cluster mode
expecting to be able to access the Spark UI from the spark-master-ip:4040
address (default port). The problem here is that the Spark UI port is
always defined randomly at runtime, although I also tried to specify
Hello folks,
I have a Spark Streaming application built with Maven (as jar) and deployed
with the spark-submit script. The application project has the following
(main) structure:
myApp
src
main
scala
com.mycompany.package
MyApp.scala
DoSomething.scala
...
resources
aPerlScript.pl
...
23 matches
Mail list logo