Using Spark 1.2.0. Tried to apply register an RDD and got:
scala.MatchError: class java.util.Date (of class java.lang.Class)
I see it was resolved in https://issues.apache.org/jira/browse/SPARK-2562
(included in 1.2.0)
Anyone encountered this issue?
Thanks,
Lior
Hi Imran,
Thanks for the suggestion! Unfortunately the type does not match. But I
could write my own function that shuffle the sample though.
Le 4/17/15 9:34 PM, Imran Rashid a écrit :
if you can store the entire sample for one partition in memory, I think
you just want:
val sample1 =
looking into the work folder of problematic application, seems that the
application is continuing creating executors, and error log of worker is as
below:
Exception in thread main java.lang.reflect.UndeclaredThrowableException:
Unknown exception in doAs
at
Almost. Jobs don't get skipped. Stages and Tasks do if the needed results
are already available.
On Sun, Apr 19, 2015 at 3:18 PM, Denny Lee denny.g@gmail.com wrote:
The job is skipped because the results are available in memory from a
prior run. More info at:
Thanks for the correction Mark :)
On Sun, Apr 19, 2015 at 3:45 PM Mark Hamstra m...@clearstorydata.com
wrote:
Almost. Jobs don't get skipped. Stages and Tasks do if the needed
results are already available.
On Sun, Apr 19, 2015 at 3:18 PM, Denny Lee denny.g@gmail.com wrote:
The job
In record reader level you can pass the file name as key or value.
sc.newAPIHadoopRDD(job.getConfiguration,
classOf[AvroKeyInputFormat[myObject]],
classOf[AvroKey[myObject]],
classOf[Text] // can contain your file)
AvroKeyInputFormat extends
In the web ui i can see some jobs as 'skipped' what does that mean? why are
these jobs skipped? do they ever get executed?
Regards
jk
In fact you can return “NULL” from your initial map and hence not resort to
OptionalString at all
From: Evo Eftimov [mailto:evo.efti...@isecc.com]
Sent: Sunday, April 19, 2015 9:48 PM
To: 'Steve Lewis'
Cc: 'Olivier Girardot'; 'user@spark.apache.org'
Subject: RE: Can a map function return
Hi all,
I have been testing GraphX on the soc-LiveJournal1 network from the SNAP
repository. Currently I am running on c3.8xlarge EC2 instances on Amazon.
These instances have 32 cores and 60GB RAM per node, and so far I have run
SSSP, PageRank, and WCC on a 1, 4, and 8 node cluster.
The issues
Well you can do another map to turn OptionalString into String as in the
cases when Optional is empty you can store e.g. “NULL” as the value of the RDD
element
If this is not acceptable (based on the objectives of your architecture) and IF
when returning plain null instead of Optional does
Hi Steve
i did spark 1.3.0 page rank bench-marking on soc-LiveJournal1 in 4 node
cluster. 16,16,8,8 Gbs ram respectively. Cluster have 4 worker including
master with 4,4,2,2 CPUs
I set executor memroy to 3g and driver to 5g.
No. of Iterations -- GraphX(mins)
1 -- 1
2
The easiest way to do that is to use a similarity metric between the
different user factors.
On Sat, Apr 18, 2015 at 7:49 AM, riginos samarasrigi...@gmail.com wrote:
Is there any way that i can see the similarity table of 2 users in that
algorithm? by that i mean the similarity between 2 users
I am exploring Spark SQL and Dataframe and trying to create an aggregration
by column and generate a single json row with aggregation. Any inputs on the
right approach will be helpful.
Here is my sample data
user,sports,major,league,count
[test1,Sports,Switzerland,NLA,6]
So you imagine something like this:
JavaRDDString words = ...
JavaRDD OptionalString wordsFiltered = words.map(new
FunctionString, OptionalString() {
@Override
public OptionalString call(String s) throws Exception {
if ((s.length()) % 2 == 1) // drop strings of odd length
Hi all,
I have been testing GraphX on the soc-LiveJournal1 network from the SNAP
repository. Currently I am running on c3.8xlarge EC2 instances on Amazon.
These instances have 32 cores and 60GB RAM per node, and so far I have run
SSSP, PageRank, and WCC on a 1, 4, and 8 node cluster.
The issues
Hi All
Getting following error, when I am compiling spark..What did I miss..? Even
googled and did not find the exact solution for this...
[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-shade-plugin:2.2:shade (default) on project
spark-assembly_2.10: Error creating shaded jar:
What JDK release are you using ?
Can you give the complete command you used ?
Which Spark branch are you working with ?
Cheers
On Sun, Apr 19, 2015 at 7:25 PM, Brahma Reddy Battula
brahmareddy.batt...@huawei.com wrote:
Hi All
Getting following error, when I am compiling spark..What did I
Hi Cesar,
Can you try 1.3.1 (
https://spark.apache.org/releases/spark-release-1-3-1.html) and see if it
still shows the error?
Thanks,
Yin
On Fri, Apr 17, 2015 at 1:58 PM, Reynold Xin r...@databricks.com wrote:
This is strange. cc the dev list since it might be a bug.
On Thu, Apr 16,
The problem is the code you use to test:
sc.parallelize(List(1, 2, 3)).map(throw new
SparkException(test)).collect();
is like the following example:
def foo: Int = Nothing = {
throw new SparkException(test)
}
sc.parallelize(List(1, 2, 3)).map(foo).collect();
So actually the Spark jobs do not
Thanks Shixiong. I'll try this.
On Sun, Apr 19, 2015, 7:36 PM Shixiong Zhu zsxw...@gmail.com wrote:
The problem is the code you use to test:
sc.parallelize(List(1, 2, 3)).map(throw new
SparkException(test)).collect();
is like the following example:
def foo: Int = Nothing = {
throw
Hey Todd
Thanks a lot for your reply...Kindly check following details..
spark version :1.1.0
jdk:jdk1.7.0_60 ,
command:mvn -Pbigtop-dist -Phive -Pyarn -Phadoop-2.4
-Dhadoop.version=V100R001C00 -DskipTests package
Thanks Regards
Brahma Reddy Battula
You need to access the underlying RDD with .rdd() and cast that. That
works for me.
On Mon, Apr 20, 2015 at 4:41 AM, RimBerry
truonghoanglinhk55b...@gmail.com wrote:
Hi everyone,
i am trying to use the direct approach in streaming-kafka-integration
Brahma since you can see the continuous integration builds are
passing, it's got to be something specific to your environment, right?
this is not even an error from Spark, but from Maven plugins.
On Mon, Apr 20, 2015 at 4:42 AM, Ted Yu yuzhih...@gmail.com wrote:
bq. -Dhadoop.version=V100R001C00
Generally what tools are used to schedule spark jobs in production?
How is spark streaming code is deployed?
I am interested in knowing the tools used like cron, oozie etc.
Thanks,
Arun
Hi everyone,
i am trying to use the direct approach in streaming-kafka-integration
http://spark.apache.org/docs/latest/streaming-kafka-integration.html
pulling data from kafka as follow
JavaPairInputDStreamString, String messages =
Hi, If you have just one physical machine then I would try out Docker
instead of a full VM (would be waste of memory and CPU).
Best regards
Le 20 avr. 2015 00:11, hnahak harihar1...@gmail.com a écrit :
Hi All,
I've big physical machine with 16 CPUs , 256 GB RAM, 20 TB Hard disk. I
just
need
Thanks a lot for your replies..
@Ted,V100R001C00 this is our internal hadoop version which is based on hadoop
2.4.1..
@Sean Owen,Yes, you are correct...Just I wanted to know, what leads this
problem...
Thanks Regards
Brahma Reddy Battula
From: Sean
I am experiencing problem with SparkStreaming (Spark 1.2.0), the onStart method
is never called on CustomReceiver when calling spark-submit against a master
node with multiple workers. However, SparkStreaming works fine with no master
node set. Anyone notice this issue?
On 20 Apr 2015 05:45, Arun Patel arunp.bigd...@gmail.com wrote:
http://23.251.129.190:8090/spark-twitter-streaming-web/analysis/3fb28f76-62fe-47f3-a1a8-66ac610c2447.html
spark jobs in production?
How is spark streaming code is deployed?
I am interested in knowing the tools used like cron,
That's right.
On Sun, Apr 19, 2015 at 8:59 AM, Arun Patel arunp.bigd...@gmail.com wrote:
Thanks Ted.
So, whatever the operations I am performing now are DataFrames and not
SchemaRDD? Is that right?
Regards,
Venkat
On Sun, Apr 19, 2015 at 9:13 AM, Ted Yu yuzhih...@gmail.com wrote:
bq.
bq. SchemaRDD is not existing in 1.3?
That's right.
See this thread for more background:
http://search-hadoop.com/m/JW1q5zQ1Xw/spark+DataFrame+schemarddsubj=renaming+SchemaRDD+gt+DataFrame
On Sat, Apr 18, 2015 at 5:43 PM, Abhishek R. Singh
abhis...@tetrationanalytics.com wrote:
I am no
Thanks Ted.
So, whatever the operations I am performing now are DataFrames and not
SchemaRDD? Is that right?
Regards,
Venkat
On Sun, Apr 19, 2015 at 9:13 AM, Ted Yu yuzhih...@gmail.com wrote:
bq. SchemaRDD is not existing in 1.3?
That's right.
See this thread for more background:
Here's a code example:
public class DateSparkSQLExample {
public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName(test).setMaster(local);
JavaSparkContext sc = new JavaSparkContext(conf);
ListSomeObject itemsList =
33 matches
Mail list logo