This inevitably means the run-time classpath includes a different copy
of the same library/class as something in your uber jar and the
different version is taking precedence. Here it's Commons
HttpComponents. Where exactly it's coming from is specific to your
deployment, but that's the issue.
On
Consider the following simple zip:
n = 6
a = sc.parallelize(range(n))
b = sc.parallelize(range(n)).map(lambda j: j)
c = a.zip(b)
print a.count(), b.count(), c.count()
6 6 4
by varying n, I find that c.count() is always min(n,4), where 4 happens to
be the number of threads on my computer. by
Thanks a lot Sean! It works now for me now~~
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/sc-textFile-can-t-recognize-004-tp8059p8071.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Thanks a lot! Let me check my maven shade plugin config and see if there is a
fix
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-throws-NoSuchFieldError-when-testing-on-cluster-mode-tp8064p8073.html
Sent from the Apache Spark User List mailing list
Its probably because our LEFT JOIN performance isn't super great ATM since
we'll use a nest loop join. Sorry! We are aware of the problem and there is
a JIRA to let us do this with a HashJoin instead. If you are feeling brave
you might try pulling in the related PR.
I run the pagerank example processing a large data set, 5GB in size, using 48
machines. The job got stuck at the time point: 14/05/20 21:32:17, as the
attached log shows. It was stuck there for more than 10 hours and then I
killed it at last. But I did not find any information explaining why it
Indeed I see a lot of duplicate package warning in the maven-shade assembly
package output, so I tried to eliminate them:
First I set scope of dependency to apache-spark to 'provided', as suggested
in this page:
http://spark.apache.org/docs/latest/submitting-applications.html
But spark master
Hi,
- I have seen similar behavior before. As far as I can tell, the root
cause is the out of memory error - verified this by monitoring the memory.
- I had a 30 GB file and was running on a single machine with 16GB.
So I knew it would fail.
- But instead of raising an
Latest Advancement:
I found the cause of NoClassDef exception: I wasn't using spark-submit,
instead I tried to run the spark application directly with SparkConf set in
the code. (this is handy in local debugging). However the old problem
remains: Even my maven-shade plugin doesn't give any warning
I also found that any buggy application submitted in --deploy-mode = cluster
mode will crash the worker (turn status to 'DEAD'). This shouldn't really
happen, otherwise nobody will use this mode. It is yet unclear whether all
workers will crash or only the one running the driver will (as I only
Alright, added you — sorry for the delay.
Matei
On Jun 12, 2014, at 10:29 PM, Sonal Goyal sonalgoy...@gmail.com wrote:
Hi,
Can we get added too? Here are the details:
Name: Nube Technologies
URL: www.nubetech.co
Description: Nube provides solutions for data curation at scale helping
Hi Sean,
OK I'm about 90% sure about the cause of this problem: Just another classic
Dependency conflict:
Myproject - Selenium - apache.httpcomponents:httpcore 4.3.1 (has
ContentType)
Spark - Spark SQL Hive - Hive - Thrift - apache.httpcomponents:httpcore
4.1.3 (has no ContentType)
Though I
Thanks a lot Matei.
Sent from my iPad
On Jun 22, 2014, at 5:20 AM, Matei Zaharia matei.zaha...@gmail.com wrote:
Alright, added you — sorry for the delay.
Matei
On Jun 12, 2014, at 10:29 PM, Sonal Goyal sonalgoy...@gmail.com wrote:
Hi,
Can we get added too? Here are the details:
13 matches
Mail list logo