Hi,
I've written a short scala app to perform word counts on a text file and am
getting the following exception as the program completes (after it prints
out all of the word counts).
Exception in thread delete Spark temp dir
C:\Users\Josh\AppData\Local\Temp\spark-0fdd0b79-7329-4690-a093
Nope, nested RDDs aren't supported:
https://groups.google.com/d/msg/spark-users/_Efj40upvx4/DbHCixW7W7kJ
https://groups.google.com/d/msg/spark-users/KC1UJEmUeg8/N_qkTJ3nnxMJ
https://groups.google.com/d/msg/spark-users/rkVPXAiCiBk/CORV5jyeZpAJ
On Sun, Mar 2, 2014 at 5:37 PM, Cosmin Radoi
or not though, so if anyone else is looking into this,
I'd love to hear their thoughts.
Josh
On Tue, Apr 8, 2014 at 1:00 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
Just took a quick look at the overview
herehttp://phoenix.incubator.apache.org/ and
the quick start guide
herehttp
Hey folks,
I'm wondering what strategies other folks are using for maintaining and
monitoring the stability of stand-alone spark clusters.
Our master very regularly loses workers, and they (as expected) never
rejoin the cluster. This is the same behavior I've seen
using akka cluster (if that's
...@gmail.com');
wrote:
Which version is this with? I haven’t seen standalone masters lose
workers. Is there other stuff on the machines that’s killing them, or what
errors do you see?
Matei
On May 16, 2014, at 9:53 AM, Josh Marcus
jmar...@meetup.comjavascript:_e(%7B%7D,'cvml','jmar
, 2014 at 3:28 PM, Josh Marcus jmar...@meetup.com wrote:
We're using spark 0.9.0, and we're using it out of the box -- not using
Cloudera Manager or anything similar.
There are warnings from the master that there continue to be heartbeats
from the unregistered workers. I will see
and messages semi-regularly on CDH5 + 0.9.0. I don't have any insight
into when it happens, but usually after heavy use and after running
for a long time. I had figured I'd see if the changes since 0.9.0
addressed it and revisit later.
On Tue, May 20, 2014 at 8:37 PM, Josh Marcus jmar
Jeremy,
Just to be clear, are you assembling a jar with that class compiled (with
its dependencies) and including the path to that jar on the command line in
an environment variable (e.g. SPARK_CLASSPATH=path ./spark-shell)?
--j
On Saturday, May 24, 2014, Jeremy Lewi jer...@lewi.us wrote:
Hi
submit jobs to the cluster either.
Thanks!
Josh
You have to use `myBroadcastVariable.value` to access the broadcasted
value; see
https://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables
On Fri, Jul 18, 2014 at 2:56 PM, Vedant Dhandhania
ved...@retentionscience.com wrote:
Hi All,
I am trying to broadcast a set in a
, upper bound index, and number of partitions. With that example
query and those values, you should end up with an RDD with two partitions,
one with the student_info from 1 through 10, and the second with ids 11
through 20.
Josh
On Wed, Jul 30, 2014 at 6:58 PM, chaitu reddy chaitzre...@gmail.com
Has anyone tried using functools.partial (
https://docs.python.org/2/library/functools.html#functools.partial) with
PySpark? If it works, it might be a nice way to address this use-case.
On Sun, Aug 17, 2014 at 7:35 PM, Davies Liu dav...@databricks.com wrote:
On Sun, Aug 17, 2014 at 11:21 AM,
Hi,
Whats the difference between amplab docker
https://github.com/amplab/docker-scripts and spark docker
https://github.com/apache/spark/tree/master/docker?
Thanks,
Josh
windowMessages1 =
messages.window(windowLength,slideInterval);
JavaPairDStreamString,String windowMessages2 =
messages.window(windowLength,slideInterval);
Thanks,
Josh
DStream. How can I accomplish this with spark?
Sincerely,
Josh
Hi,
Hopefully a simple question. Though is there an example of where to save
the output of countByWindow ? I would like to save the results to external
storage (kafka or redis). The examples show only stream.print()
Thanks,
Josh
of countByWindow with a
function that performs the save operation.
On Fri, Aug 22, 2014 at 1:58 AM, Josh J joshjd...@gmail.com wrote:
Hi,
Hopefully a simple question. Though is there an example of where to save
the output of countByWindow ? I would like to save the results to external
storage (kafka
If I recall, you should be able to start Hadoop MapReduce using
~/ephemeral-hdfs/sbin/start-mapred.sh.
On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini tomer@gmail.com wrote:
Hi,
I would like to copy log files from s3 to the cluster's
ephemeral-hdfs. I tried to use distcp, but I guess
/2144
- Josh
On Fri, Oct 3, 2014 at 6:44 PM, tomo cocoa cocoatom...@gmail.com wrote:
Hi,
I prefer that PySpark can also be executed on Python 3.
Do you have some reason or demand to use PySpark through Python3?
If you create an issue on JIRA, I would try to resolve it.
On 4 October
drivers
and workers.
- Josh
On Fri, Oct 10, 2014 at 5:24 PM, Andy Davidson
a...@santacruzintegration.com wrote:
Hi
I am running spark on an ec2 cluster. I need to update python to 2.7. I
have been following the directions on
http://nbviewer.ipython.org/gist/JoshRosen/6856670
https
Hi Theo,
Check out *spark-perf*, a suite of performance benchmarks for Spark:
https://github.com/databricks/spark-perf.
- Josh
On Fri, Oct 10, 2014 at 7:27 PM, Theodore Si sjyz...@gmail.com wrote:
Hi,
Let's say that I managed to port Spark from TCP/IP to RDMA.
What tool or benchmark can I
Hi,
How can I join neighbor sliding windows in spark streaming?
Thanks,
Josh
Hi,
Is there a dockerfiles available which allow to setup a docker spark 1.1.0
cluster?
Thanks,
Josh
than once in the event of a worker
failure.
http://spark.apache.org/docs/latest/streaming-programming-guide.html#failure-of-a-worker-node
Thanks,
Josh
Hi,
How could I combine rdds? I would like to combine two RDDs if the count in
an RDD is not above some threshold.
Thanks,
Josh
Hi,
How do I run multiple spark applications in parallel? I tried to run on
yarn cluster, though the second application submitted does not run.
Thanks,
Josh
, Intel(R) Xeon(R) CPU E5-2420 v2 @ 2.20GHz
32 GB RAM
Thanks,
Josh
On Tue, Oct 28, 2014 at 4:15 PM, Soumya Simanta soumya.sima...@gmail.com
wrote:
Try reducing the resources (cores and memory) of each application.
On Oct 28, 2014, at 7:05 PM, Josh J joshjd...@gmail.com wrote:
Hi,
How
Hi,
Is there a nice or optimal method to randomly shuffle spark streaming RDDs?
Thanks,
Josh
? in general RDDs don't have
ordering at all -- excepting when you sort for example -- so a
permutation doesn't make sense. Do you just want a well-defined but
random ordering of the data? Do you just want to (re-)assign elements
randomly to partitions?
On Mon, Nov 3, 2014 at 4:33 PM, Josh J joshjd
is guaranteed about that.
If you want to permute an RDD, how about a sortBy() on a good hash
function of each value plus some salt? (Haven't thought this through
much but sounds about right.)
On Mon, Nov 3, 2014 at 4:59 PM, Josh J joshjd...@gmail.com wrote:
When I'm outputting the RDDs
: Ordering[K], implicit ctag:
scala.reflect.ClassTag[K])org.apache.spark.rdd.RDD[String].
Unspecified value parameter f.
On Tue, Nov 4, 2014 at 11:28 AM, Josh J joshjd...@gmail.com wrote:
Hi,
Does anyone have any good examples of using sortby for RDDs and scala?
I'm receiving
not enough
)
and
found : java.util.LinkedList[org.apache.spark.rdd.RDD[String]]
required: scala.collection.mutable.Queue[org.apache.spark.rdd.RDD[?]]
Thanks,
Josh
Hi,
Is it possible to concatenate or append two Dstreams together? I have an
incoming stream that I wish to combine with data that's generated by a
utility. I then need to process the combined Dstream.
Thanks,
Josh
I think it's just called union
On Tue, Nov 11, 2014 at 2:41 PM, Josh J joshjd...@gmail.com wrote:
Hi,
Is it possible to concatenate or append two Dstreams together? I have an
incoming stream that I wish to combine with data that's generated by a
utility. I then need to process the combined
Hi,
I was wondering if the adaptive stream processing and dynamic batch
processing was available to use in spark streaming? If someone could help
point me in the right direction?
Thanks,
Josh
Referring to this paper http://dl.acm.org/citation.cfm?id=2670995.
On Fri, Nov 14, 2014 at 10:42 AM, Josh J joshjd...@gmail.com wrote:
Hi,
I was wondering if the adaptive stream processing and dynamic batch
processing was available to use in spark streaming? If someone could help
point me
:
https://github.com/simplymeasured/phoenix-spark
Josh
On Fri, Nov 21, 2014 at 4:14 PM, Alaa Ali contact.a...@gmail.com wrote:
I want to run queries on Apache Phoenix which has a JDBC driver. The query
that I want to run is:
select ts,ename from random_data_date limit 10
But I'm having issues
also do a
lot more with it than just the Phoenix functions provide.
I don't know if this works with PySpark or not, but assuming the
'newHadoopRDD' functionality works for other input formats, it should work
for Phoenix as well.
Josh
On Fri, Nov 21, 2014 at 5:12 PM, Alaa Ali contact.a...@gmail.com
can maintain exactly once
semantics when writing to topic 2?
Thanks,
Josh
Is there a way to do this that preserves exactly once semantics for the
write to Kafka?
On Tue, Sep 2, 2014 at 12:30 PM, Tim Smith secs...@gmail.com wrote:
I'd be interested in finding the answer too. Right now, I do:
val kafkaOutMsgs = kafkInMessages.map(x=myFunc(x._2,someParam))
SerializableMapWrapper was added in
https://issues.apache.org/jira/browse/SPARK-3926; do you mind opening a new
JIRA and linking it to that one?
On Mon, Dec 1, 2014 at 12:17 AM, lokeshkumar lok...@dataken.net wrote:
The workaround was to wrap the map returned by spark libraries into HashMap
by the mailing list.
I wanted to mention this issue to the Spark community to see whether there
are any good solutions to address this. I have spoken to users who think
that our mailing list is unresponsive / inactive because their un-posted
messages haven't received any replies.
- Josh
will be sent to both spark.incubator.apache.org and spark.apache.org (if
that is the case, i'm not sure which alias nabble posts get sent to) would
make things a lot more clear.
On Sat, Dec 13, 2014 at 5:05 PM, Josh Rosen rosenvi...@gmail.com wrote:
I've noticed that several users are attempting to post
a bit of additional context in the meantime.
- Josh
On Thu, Dec 25, 2014 at 5:36 PM, Tobias Pfeiffer t...@preferred.jp wrote:
Nick,
uh, I would have expected a rather heated discussion, but the opposite
seems to be the case ;-)
Independent of my personal preferences w.r.t. usability, habits etc
/ stage / task progress information, as well as expanding the
types of information exposed through the stable status API interface.
- Josh
On Thu, Dec 25, 2014 at 10:01 AM, Eric Friedman eric.d.fried...@gmail.com
wrote:
Spark 1.2.0 is SO much more usable than previous releases -- many thanks
).map(_._2)
streamtoread.sample(withReplacement = true, fraction = fraction)
How do I use the sample
http://spark.apache.org/docs/latest/programming-guide.html#transformations()
method with Spark Streaming?
Thanks,
Josh
Josh
Is there documentation available for status API? I would like to use it.
Thanks,
Aniket
On Sun Dec 28 2014 at 02:37:32 Josh Rosen rosenvi...@gmail.com wrote:
The console progress bars are implemented on top of a new stable status
API that was added in Spark 1.2. It's possible
Hi Sven,
Do you have a small example program that you can share which will allow me
to reproduce this issue? If you have a workload that runs into this, you
should be able to keep iteratively simplifying the job and reducing the
data set size until you hit a fairly minimal reproduction (assuming
To configure the Python executable used by PySpark, see the Using the
Shell Python section in the Spark Programming Guide:
https://spark.apache.org/docs/latest/programming-guide.html#using-the-shell
You can set the PYSPARK_PYTHON environment variable to choose the Python
executable that will be
Which version of Spark are you using?
On Wed, Dec 31, 2014 at 10:24 PM, rapelly kartheek kartheek.m...@gmail.com
wrote:
Hi,
I get this following Exception when I submit spark application that
calculates the frequency of characters in a file. Especially, when I
increase the size of data, I
:04 PM, Josh Rosen rosenvi...@gmail.com wrote:
Which version of Spark are you using?
On Wed, Dec 31, 2014 at 10:24 PM, rapelly kartheek
kartheek.m...@gmail.com wrote:
Hi,
I get this following Exception when I submit spark application that
calculates the frequency of characters in a file
This log message is normal; in this case, this message is saying that the
final stage needed to compute your job does not have any dependencies /
parent stages and that there are no parent stages that need to be computed.
On Thu, Jan 1, 2015 at 11:02 PM, shahid sha...@trialx.com wrote:
hi guys
Which version of Spark are you using? It seems like the issue here is that
the map output statuses are too large to fit in the Akka frame size. This
issue has been fixed in Spark 1.2 by using a different encoding for map
outputs for jobs with many reducers (
Do you mind filing a JIRA issue for this which includes the actual error
message string that you saw? https://issues.apache.org/jira/browse/SPARK
On Thu, Jan 22, 2015 at 8:31 AM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:
I am not sure if you get the same exception as I do --
There's an open PR for supporting yarn-cluster mode in PySpark:
https://github.com/apache/spark/pull/3976 (currently blocked on reviewer
attention / time)
On Wed, Jan 14, 2015 at 3:16 PM, Marcelo Vanzin van...@cloudera.com wrote:
As the error message says...
On Wed, Jan 14, 2015 at 3:14 PM,
This looks like a bug in the master branch of Spark, related to some recent
changes to EventLoggingListener. You can reproduce this bug on a fresh
Spark checkout by running
./bin/spark-shell --conf spark.eventLog.enabled=true --conf
spark.eventLog.dir=/tmp/nonexistent-dir
where
We have dockerized Spark Master and worker(s) separately and are using it
in
our dev environment.
Is this setup available on github or dockerhub?
On Tue, Dec 9, 2014 at 3:50 PM, Venkat Subramanian vsubr...@gmail.com
wrote:
We have dockerized Spark Master and worker(s) separately and are
Hi,
I have a stream pipeline which invokes map, reduceByKey, filter, and
flatMap. How can I measure the time taken in each stage?
Thanks,
Josh
I'm not sure how to confirm how the moving is happening, however, one of
the jobs just completed that I was talking about with 9k files of 4mb each.
Spark UI showed the job being complete after ~2 hours. The last four hours
of the job was just moving the files from _temporary to their final
thoughts and actually very curious about how others
are running Spark on Mesos with large heaps (as a result of large
memory machines). Perhaps this is a non-issue when we have more
multi-tenancy in the cluster, but for now, this is not the case.
Thanks,
Josh
On 24 December 2014 at 06:22, Tim Chen
I've got a data set of activity by user. For each user, I'd like to train a
decision tree model. I currently have the feature creation step implemented
in Spark and would naturally like to use mllib's decision tree model.
However, it looks like the decision tree model expects the whole RDD and
(data) but just to deal with it on whatever spark worker is
handling kvp? Does that question make sense?
Thanks!
Josh
On Sun, Jan 11, 2015 at 4:12 AM, Sean Owen so...@cloudera.com wrote:
You just mean you want to divide the data set into N subsets, and do
that dividing by user, not make one
are using RDDs inside RDDs. But I
am also not sure you should do what it looks like you are trying to do.
On Jan 13, 2015 12:32 AM, Josh Buffum jbuf...@gmail.com wrote:
Sean,
Thanks for the response. Is there some subtle difference between one
model partitioned by N users or N models per each 1 user
Hi,
I'm trying to run Spark Streaming standalone on two nodes. I'm able to run
on a single node fine. I start both workers and it registers in the Spark
UI. However, the application says
SparkDeploySchedulerBackend: Asked to remove non-existent executor 2
Any ideas?
Thanks,
Josh
hard to say from
this error trace alone.
On December 30, 2014 at 5:17:08 PM, Sven Krasser (kras...@gmail.com) wrote:
Hey Josh,
I am still trying to prune this to a minimal example, but it has been tricky
since scale seems to be a factor. The job runs over ~720GB of data (the
cluster's total RAM
fix. In the meantime, I recommend that you increase your
Akka frame size.
On Sat, Jan 3, 2015 at 8:51 PM, Saeed Shahrivari saeed.shahriv...@gmail.com
wrote:
I use the 1.2 version.
On Sun, Jan 4, 2015 at 3:01 AM, Josh Rosen rosenvi...@gmail.com wrote:
Which version of Spark are you using
@Brad, I'm guessing that the additional memory usage is coming from the
shuffle performed by coalesce, so that at least explains the memory blowup.
On Sun, Jan 4, 2015 at 10:16 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
You can try:
- Using KryoSerializer
- Enabling RDD Compression
-
Can you please file a JIRA issue for this? This will make it easier to
triage this issue.
https://issues.apache.org/jira/browse/SPARK
Thanks,
Josh
On Thu, Jan 8, 2015 at 2:34 AM, frodo777 roberto.vaquer...@bitmonlab.com
wrote:
Hello everyone.
With respect to the configuration problem
On Fri, Feb 13, 2015 at 2:21 AM, Gerard Maas gerard.m...@gmail.com wrote:
KafkaOutputServicePool
Could you please give an example code of how KafkaOutputServicePool would
look like? When I tried object pooling I end up with various not
serializable exceptions.
Thanks!
Josh
at 10:29 PM, Josh J joshjd...@gmail.com wrote:
Hi,
I plan to run a parameter search varying the number of cores, epoch, and
parallelism. The web console provides a way to archive the previous runs,
though is there a way to view in the console the throughput? Rather than
logging
On Wed, Feb 25, 2015 at 7:54 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
For SparkStreaming applications, there is already a tab called Streaming
which displays the basic statistics.
Would I just need to extend this tab to add the throughput?
We (Databricks) use our own DirectOutputCommitter implementation, which is
a couple tens of lines of Scala code. The class would almost entirely be a
no-op except we took some care to properly handle the _SUCCESS file.
On Fri, Feb 20, 2015 at 3:52 PM, Mingyu Kim m...@palantir.com wrote:
I
the logs
files to the web console processing times?
Thanks,
Josh
Do you have any more specific profiling data that you can share? I'm
curious to know where AppendOnlyMap.changeValue is being called from.
On Fri, May 8, 2015 at 1:26 PM, Michal Haris michal.ha...@visualdna.com
wrote:
+dev
On 6 May 2015 10:45, Michal Haris michal.ha...@visualdna.com wrote:
I would be cautious regarding use of spark.cleaner.ttl, as it can lead to
confusing error messages if time-based cleaning deletes resources that are
still needed. See my comment at
the PySpark unit tests locally to make sure that
the change still work correctly in older branches. I can also help with
backports / fixing conflicts.
Thanks to Davies Liu, Shane Knapp, Thom Neale, Xiangrui Meng, and everyone
else who helped with this patch.
- Josh
to continue debugging this issue, I think we should move this
discussion over to JIRA so it's easier to track and reference.
Hope this helps,
Josh
On Thu, Apr 2, 2015 at 7:34 AM, Jacek Lewandowski
jacek.lewandow...@datastax.com wrote:
A very simple example which works well with Spark 1.2
suspect that keeping all of the spark and phoenix dependencies marked as
'provided', and including the Phoenix client JAR in the Spark classpath
would work as well.
Good luck,
Josh
On Tue, Jun 9, 2015 at 4:40 AM, Jeroen Vlek j.v...@anchormen.nl wrote:
Hi,
I posted a question with regards
Josh
On Wed, Jun 10, 2015 at 4:11 AM, Jeroen Vlek j.v...@anchormen.nl wrote:
Hi Josh,
Thank you for your effort. Looking at your code, I feel that mine is
semantically the same, except written in Java. The dependencies in the
pom.xml
all have the scope provided. The job is submitted
There's a discussion of this at https://github.com/apache/spark/pull/5403
On Wed, Jun 10, 2015 at 7:08 AM, Corey Nolet cjno...@gmail.com wrote:
Is it possible to configure Spark to do all of its shuffling FULLY in
memory (given that I have enough memory to store all the data)?
Mind filing a JIRA?
On Tue, Jun 23, 2015 at 9:34 AM, Koert Kuipers ko...@tresata.com wrote:
just a heads up, i was doing some basic coding using DataFrame, Row,
StructType, etc. and i ended up with deadlocks in my sbt tests due to the
usage of
ScalaReflectionLock.synchronized in the spark
Which Spark version are you using? AFAIK the corruption bugs in sort-based
shuffle should have been fixed in newer Spark releases.
On Wed, Jun 24, 2015 at 12:25 PM, Piero Cinquegrana
pcinquegr...@marketshare.com wrote:
Switching spark.shuffle.manager from sort to hash fixed this issue as
My hunch is that you changed spark.serializer to Kryo but left
spark.closureSerializer unmodified, so it's still using Java for closure
serialization. Kryo doesn't really work as a closure serializer but
there's an open pull request to fix this:
https://github.com/apache/spark/pull/6361
On Mon,
with a GMane link to the thread?
Good luck,
Josh
On Thu, Jun 11, 2015 at 2:38 AM, Jeroen Vlek j.v...@anchormen.nl wrote:
Hi Josh,
That worked! Thank you so much! (I can't believe it was something so
obvious
;) )
If you care about such a thing you could answer my question here for
bounty
--
*From:* Josh Rosen rosenvi...@gmail.com
*To:* Sanjay Subramanian sanjaysubraman...@yahoo.com
*Cc:* user@spark.apache.org user@spark.apache.org
*Sent:* Friday, June 12, 2015 7:15 AM
*Subject:* Re: spark-sql from CLI ---EXCEPTION:
java.lang.OutOfMemoryError: Java
If your job is dying due to out of memory errors in the post-shuffle stage,
I'd consider the following approach for implementing de-duplication /
distinct():
- Use sortByKey() to perform a full sort of your dataset.
- Use mapPartitions() to iterate through each partition of the sorted
dataset,
Sent from my phone
On Jun 11, 2015, at 8:43 AM, Sanjay Subramanian
sanjaysubraman...@yahoo.com.INVALID wrote:
hey guys
Using Hive and Impala daily intensively.
Want to transition to spark-sql in CLI mode
Currently in my sandbox I am using the Spark (standalone mode) in the CDH
It sounds like this might be caused by a memory configuration problem. In
addition to looking at the executor memory, I'd also bump up the driver memory,
since it appears that your shell is running out of memory when collecting a
large query result.
Sent from my phone
On Jun 11, 2015, at
...@gmail.com wrote:
Hi
We are using spark 1.3.1
Avro-chill (tomorrow will check if its important) we register avro
classes from java
Avro 1.7.6
On May 31, 2015 22:37, Josh Rosen rosenvi...@gmail.com wrote:
Which Spark version are you using? I'd like to understand whether this
change could
If you can't run a patched Spark version, then you could also consider
using LZF compression instead, since that codec isn't affected by this bug.
On Mon, Jun 1, 2015 at 3:32 PM, Andrew Or and...@databricks.com wrote:
Hi Deepak,
This is a notorious bug that is being tracked at
enough to split data into disk.
We will work on it to understand and reproduce the problem(not first
priority though...)
On 1 June 2015 at 23:02, Josh Rosen rosenvi...@gmail.com wrote:
How much work is to produce a small standalone reproduction? Can you
create an Avro file with some mock
My suggestion is that you change the Spark setting which controls the
compression codec that Spark uses for internal data transfers. Set
spark.io.compression.codec
to lzf in your SparkConf.
On Mon, Jun 1, 2015 at 8:46 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
Hello Josh,
Are you suggesting
I don't think that 0.9.3 has been released, so I'm assuming that you're
running on branch-0.9.
There's been over 4000 commits between 0.9.3 and 1.3.1, so I'm afraid that
this question doesn't have a concise answer:
https://github.com/apache/spark/compare/branch-0.9...v1.3.1
To narrow down the
Can you share a query or stack trace? More information would make this
question easier to answer.
On Tue, Aug 11, 2015 at 8:50 PM, Ravisankar Mani rrav...@gmail.com wrote:
Hi all,
We got an exception like
“org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call
to
I've opened a PR to fix this; please take a look:
https://github.com/apache/spark/pull/7405
On Tue, Jul 14, 2015 at 11:22 AM, Koert Kuipers ko...@tresata.com wrote:
it works for scala 2.10, but for 2.11 i get:
[ERROR]
Hi Jerry,
Do you have speculation enabled? A write which produces one million files /
output partitions might be using tons of driver memory via the
OutputCommitCoordinator's bookkeeping data structures.
On Sun, Oct 25, 2015 at 5:50 PM, Jerry Lam wrote:
> Hi spark guys,
>
Hi Sjoerd,
Did your job actually *fail* or did it just generate many spurious
exceptions? While the stacktrace that you posted does indicate a bug, I
don't think that it should have stopped query execution because Spark
should have fallen back to an interpreted code path (note the "Failed to
This is https://issues.apache.org/jira/browse/SPARK-10422, which has been
fixed in Spark 1.5.1.
On Wed, Oct 21, 2015 at 4:40 PM, Sourav Mazumder <
sourav.mazumde...@gmail.com> wrote:
> In 1.5.0 if I use randomSplit on a data frame I get this error.
>
> Here is teh code snippet -
>
> val
When we remove this, we should add a style-checker rule to ban the import
so that it doesn't get added back by accident.
On Mon, Nov 9, 2015 at 6:13 PM, Michael Armbrust
wrote:
> Yeah, we should probably remove that.
>
> On Mon, Nov 9, 2015 at 5:54 PM, Ted Yu
Tip: jump straight to 1.5.2; it has some key bug fixes.
Sent from my phone
> On Nov 13, 2015, at 10:02 PM, AlexG wrote:
>
> Never mind; when I switched to Spark 1.5.0, my code works as written and is
> pretty fast! Looking at some Parquet related Spark jiras, it seems that
1 - 100 of 146 matches
Mail list logo