idtest/id
goals
goaltest/goal
/goals
/execution
/executions
/plugin
/plugins
On Tue, Aug 25, 2015 at 2:10 PM, Mike Trienis mike.trie...@orcsol.com
wrote:
Hello
Hello,
I am using sbt and created a unit test where I create a `HiveContext` and
execute some query and then return. Each time I run the unit test the JVM
will increase it's memory usage until I get the error:
Internal error when running tests: java.lang.OutOfMemoryError: PermGen space
Exception
Hi All,
I would like some clarification regarding window functions for Apache Spark
1.4.0
-
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
In particular, the rowsBetween
* {{{
* val w = Window.partitionBy(name).orderBy(id)
* df.select(
Hi All,
I have an RDD of case class objects.
scala case class Entity(
| value: String,
| identifier: String
| )
defined class Entity
scala Entity(hello, id1)
res25: Entity = Entity(hello,id1)
During a map operation, I'd like to return a new RDD that contains all of
the
[mailto:rhbutani.sp...@gmail.com]
*Sent:* Monday, July 20, 2015 5:37 PM
*To:* Mohammed Guller
*Cc:* Michael Armbrust; Mike Trienis; user@spark.apache.org
*Subject:* Re: Data frames select and where clause dependency
Yes via: org.apache.spark.sql.catalyst.optimizer.ColumnPruning
See
I'd like to understand why the where field must exist in the select clause.
For example, the following select statement works fine
- df.select(field1, filter_field).filter(df(filter_field) ===
value).show()
However, the next one fails with the error in operator !Filter
(filter_field#60 =
Hello,
I'd like to understand how other people have been aggregating metrics
using Spark Streaming and Cassandra database. Currently I have design
some data models that will stored the rolled up metrics. There are two
models that I am considering:
CREATE TABLE rollup_using_counters (
since they are usually foreground processes
with master it's a bit more complicated, ./sbin/start-master.sh goes
background which is not good for supervisor, but anyway I think it's
doable(going to setup it too in a few days)
On 3 June 2015 at 21:46, Mike Trienis mike.trie...@orcsol.com wrote
Hi All,
I am curious to know if anyone has successfully deployed a spark cluster
using supervisord?
- http://supervisord.org/
Currently I am using the cluster launch scripts which are working greater,
however, every time I reboot my VM or development environment I need to
re-launch the
core, an executor is simply a jvm instance and
as such it can be granted any number of cores and ram
So check how many cores you have per executor
Sent from Samsung Mobile
Original message
From: Mike Trienis
Date:2015/05/22 21:51 (GMT+00:00)
To: user@spark.apache.org
Hi All,
I have cluster of four nodes (three workers and one master, with one core
each) which consumes data from Kinesis at 15 second intervals using two
streams (i.e. receivers). The job simply grabs the latest batch and pushes
it to MongoDB. I believe that the problem is that all tasks are
I guess each receiver occupies a executor. So there was only one executor
available for processing the job.
On Fri, May 22, 2015 at 1:24 PM, Mike Trienis mike.trie...@orcsol.com
wrote:
Hi All,
I have cluster of four nodes (three workers and one master, with one core
each) which consumes data
when you do this?
I saw a lot of lease not owned by this Kinesis Client type of errors,
from what I remember.
lemme know!
-Chris
On May 8, 2015, at 4:36 PM, Mike Trienis mike.trie...@orcsol.com wrote:
- [Kinesis stream name]: The Kinesis stream that this streaming
application
. If you see errors, you may need to manually delete
the DynamoDB table.*
On Fri, May 8, 2015 at 2:06 PM, Mike Trienis mike.trie...@orcsol.com
wrote:
Hi All,
I am submitting the assembled fat jar file by the command:
bin/spark-submit --jars /spark-streaming-kinesis-asl_2.10-1.3.0.jar
Hi All,
I am submitting the assembled fat jar file by the command:
bin/spark-submit --jars /spark-streaming-kinesis-asl_2.10-1.3.0.jar --class
com.xxx.Consumer -0.1-SNAPSHOT.jar
It reads the data file from kinesis using the stream name defined in a
configuration file. It turns out that it
with no success :(
Would be curious to know if you got it working.
Vadim
On Apr 13, 2015, at 9:36 PM, Mike Trienis mike.trie...@orcsol.com wrote:
Hi All,
I have having trouble building a fat jar file through sbt-assembly.
[warn] Merging 'META-INF/NOTICE.txt' with strategy 'rename'
[warn
a similar situation.
I hope that gives some ideas for resolving your issue.
Regards,
Rich
On Tue, Apr 14, 2015 at 1:14 PM, Mike Trienis mike.trie...@orcsol.com
wrote:
Hi Vadim,
After removing provided from org.apache.spark %%
spark-streaming-kinesis-asl I ended up with huge number
Hi All,
I have having trouble building a fat jar file through sbt-assembly.
[warn] Merging 'META-INF/NOTICE.txt' with strategy 'rename'
[warn] Merging 'META-INF/NOTICE' with strategy 'rename'
[warn] Merging 'META-INF/LICENSE.txt' with strategy 'rename'
[warn] Merging 'META-INF/LICENSE' with
got it working.
Vadim
On Apr 13, 2015, at 9:36 PM, Mike Trienis mike.trie...@orcsol.com wrote:
Hi All,
I have having trouble building a fat jar file through sbt-assembly.
[warn] Merging 'META-INF/NOTICE.txt' with strategy 'rename'
[warn] Merging 'META-INF/NOTICE' with strategy 'rename
It's because your tests are running in parallel and you can only have one
context running at a time.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Cannot-run-unit-test-tp14459p22429.html
Sent from the Apache Spark User List mailing list archive at
_
From: Mike Trienis mike.trie...@orcsol.com
Sent: Wednesday, March 18, 2015 2:45 PM
Subject: Spark Streaming S3 Performance Implications
To: user@spark.apache.org
Hi All,
I am pushing data from Kinesis stream to S3 using Spark Streaming and
noticed that during
Hi All,
I am pushing data from Kinesis stream to S3 using Spark Streaming and
noticed that during testing (i.e. master=local[2]) the batches (1 second
intervals) were falling behind the incoming data stream at about 5-10
events / second. It seems that the rdd.saveAsTextFile(s3n://...) is taking
Please ignore my question, you can simply specify the root directory and it
looks like redshift takes care of the rest.
copy mobile
from 's3://BUCKET_NAME/'
credentials
json 's3://BUCKET_NAME/jsonpaths.json'
On Thu, Mar 5, 2015 at 3:33 PM, Mike Trienis mike.trie...@orcsol.com
wrote:
Hi
Hi All,
I am receiving data from AWS Kinesis using Spark Streaming and am writing
the data collected in the dstream to s3 using output function:
dstreamData.saveAsTextFiles(s3n://XXX:XXX@/)
After the run the application for several seconds, I end up with a sequence
of directories in S3 that
Hi All,
I am looking at integrating a data stream from AWS Kinesis to AWS Redshift
and since I am already ingesting the data through Spark Streaming, it seems
convenient to also push that data to AWS Redshift at the same time.
I have taken a look at the AWS kinesis connector although I am not
Hi All,
I have Spark Streaming setup to write data to a replicated MongoDB database
and would like to understand if there would be any issues using the Reactive
Mongo library to write directly to the mongoDB? My stack is Apache Spark
sitting on top of Cassandra for the datastore, so my thinking
Hi All,
I have Spark Streaming setup to write data to a replicated MongoDB database
and would like to understand if there would be any issues using the
Reactive Mongo library to write directly to the mongoDB? My stack is Apache
Spark sitting on top of Cassandra for the datastore, so my thinking
. Februar 2015 10:03
An: Paolo Platter paolo.plat...@agilelab.it
Cc: Mike Trienis mike.trie...@orcsol.com, user@spark.apache.org
user@spark.apache.org
Betreff: Re: Datastore HDFS vs Cassandra
One additional comment I would make is that you should be careful with
Updates in Cassandra, it does
28 matches
Mail list logo