uot; add
> new RDD methods in.
>
> How can I specify a custom version? modify version numbers in all the
> pom.xml file?
>
>
>
> On Dec 5, 2016, at 9:12 PM, Jakob Odersky <ja...@odersky.com> wr
It looks like you're having issues with including your custom spark
version (with the extensions) in your test project. To use your local
spark version:
1) make sure it has a custom version (let's call it 2.1.0-CUSTOM)
2) publish it to your local machine with `sbt publishLocal`
3) include the
Hi everyone,
is there any ongoing discussion/documentation on the redesign of sinks?
I think it could be a good thing to abstract away the underlying
streaming model, however that isn't directly related to Holden's first
point. The way I understand it, is to slightly change the
DataStreamWriter
> command and binds to the output fds from that process, so daemonizing is
> causing us minor hardship and seems like an easy thing to make optional.
> We'd be happy to make the PR as well.
>
> --Mike
>
> On Thu, Sep 29, 2016 at 5:25 PM, Jakob Odersky <ja...@odersky
Hi Kabeer,
which version of Spark are you using? I can't reproduce the error in
latest Spark master.
regards,
--Jakob
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
I'm curious, what kind of container solutions require foreground
processes? Most init systems work fine with "starter" processes that
run other processes. IIRC systemd and start-stop-daemon have an option
called "fork", that will expect the main process to run another one in
the background and
I agree with Sean's answer, you can check out the relevant serializer
here
https://github.com/twitter/chill/blob/develop/chill-scala/src/main/scala/com/twitter/chill/Traversable.scala
On Wed, Sep 28, 2016 at 3:11 AM, Sean Owen wrote:
> My guess is that Kryo specially handles
Hash codes should try to avoid collisions of objects that are not
equal. Integer overflowing is not an issue by itself
On Wed, Sep 21, 2016 at 10:49 PM, WangJianfei
wrote:
> Than you very much sir! but what i want to know is whether the hashcode
> overflow will
t a.hashCode == b.hashCode when
> a.equals(b), the bidirectional case is usually harder to satisfy due to
> possibility of collisions.
>
> Good info:
> http://www.programcreek.com/2011/07/java-equals-and-hashcode-contract/
> _____
> From: Jakob Odersky <
Hi,
It is used jointly with a custom implementation of the `equals`
method. In Scala, you can override the `equals` method to change the
behaviour of `==` comparison. On example of this would be to compare
classes based on their parameter values (i.e. what case classes do).
Partitioners aren't
Hi Xiang,
this error also appears in client mode (maybe the situation that you
were referring to and that worked was local mode?), however the error
is expected and is not a bug.
this line in your snippet:
object Main extends A[String] { //...
is, after desugaring, equivalent to:
object
There are some flaky tests that occasionally fail, my first
recommendation would be to re-run the test suite. Another thing to
check is if there are any applications listening to spark's default
ports.
Btw, what is your environment like? In case it is windows, I don't
think tests are regularly run
+1 to Sean's answer, importing varargs.
In this case the _root_ is also unnecessary (it would be required in
case you were using it in a nested package called "scala" itself)
On Thu, Sep 8, 2016 at 9:27 AM, Sean Owen wrote:
> I think the @_root_ version is redundant because
>
Hi Dayne,
you can look at this page for some starter issues:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20labels%20%3D%20Starter%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened).
Also check out this guide on how to contribute to Spark
implemented.
>
> However, even on generating the file under the default resourceDirectory =>
> core/src/resources doesn't pick the file in jar after doing a clean. So this
> seems to be a different issue.
>
>
>
>
>
> On Thu, May 19, 2016 at 4:17 PM, Jakob Oders
To echo my comment on the PR: I think the "sbt way" to add extra,
generated resources to the classpath is by adding a new task to the
`resourceGenerators` setting. Also, the task should output any files
into the directory specified by the `resourceManaged` setting. See
I just found out how the hash is calculated:
gpg --print-md sha512 .tgz
you can use that to check if the resulting output matches the contents
of .tgz.sha
On Mon, Apr 4, 2016 at 3:19 PM, Jakob Odersky <ja...@odersky.com> wrote:
> The published hash is a SHA512.
>
> You can verif
Is someone going to retry fixing these packages? It's still a problem.
>>>>
>>>> Also, it would be good to understand why this is happening.
>>>>
>>>> On Fri, Mar 18, 2016 at 6:49 PM Jakob Odersky <ja...@odersky.com> wrote:
>>
I mean from the perspective of someone developing Spark, it makes
things more complicated. It's just my point of view, people that
actually support Spark deployments may have a different opinion ;)
On Thu, Mar 24, 2016 at 2:41 PM, Jakob Odersky <ja...@odersky.com> wrote:
> You can, but s
You can, but since it's going to be a maintainability issue I would
argue it is in fact a problem.
On Thu, Mar 24, 2016 at 2:34 PM, Marcelo Vanzin <van...@cloudera.com> wrote:
> Hi Jakob,
>
> On Thu, Mar 24, 2016 at 2:29 PM, Jakob Odersky <ja...@odersky.com> wrote:
&
Reynold's 3rd point is particularly strong in my opinion. Supporting
Scala 2.12 will require Java 8 anyway, and introducing such a change
is probably best done in a major release.
Consider what would happen if Spark 2.0 doesn't require Java 8 and
hence not support Scala 2.12. Will it be stuck on
I just experienced the issue, however retrying the download a second
time worked. Could it be that there is some load balancer/cache in
front of the archive and some nodes still serve the corrupt packages?
On Fri, Mar 18, 2016 at 8:00 AM, Nicholas Chammas
wrote:
> I'm
com> wrote:
> I just retried the Spark 1.6.1 / Hadoop 2.6 download and got a corrupt ZIP
> file.
>
> Jakob, are you sure the ZIP unpacks correctly for you? Is it the same Spark
> 1.6.1/Hadoop 2.6 package you had a success with?
>
> On Fri, Mar 18, 2016 at 6:11 PM Jakob Odersk
I would recommend (non-binding) option 1.
Apart from the API breakage I can see only advantages, and that sole
disadvantage is minimal for a few reasons:
1. the DataFrame API has been "Experimental" since its implementation,
so no stability was ever implied
2. considering that the change is for
Awesome!
+1 on Steve Loughran's question, how does this affect support for
2.10? Do future contributions need to work with Scala 2.10?
cheers
On Mon, Feb 1, 2016 at 7:02 AM, Ted Yu wrote:
> The following jobs have been established for build against Scala 2.10:
>
>
Nitpick: the up-to-date version of said wiki page is
https://spark.apache.org/docs/1.6.0/job-scheduling.html (not sure how
much it changed though)
On Wed, Jan 27, 2016 at 7:50 PM, Chayapan Khannabha wrote:
> I would start at this wiki page
>
A while ago, I remember reading that multiple active Spark contexts
per JVM was a possible future enhancement.
I was wondering if this is still on the roadmap, what the major
obstacles are and if I can be of any help in adding this feature?
regards,
--Jakob
make-distribution and the second code snippet both create a distribution
from a clean state. They therefore require that every source file be
compiled and that takes time (you can maybe tweak some settings or use a
newer compiler to gain some speed).
I'm inferring from your question that for your
Hi,
datasets are being built upon the experimental DataFrame API, does this
mean DataFrames won't be experimental in the near future?
thanks,
--Jakob
Hey Jeff,
Do you mean reading from multiple text files? In that case, as a
workaround, you can use the RDD#union() (or ++) method to concatenate
multiple rdds. For example:
val lines1 = sc.textFile("file1")
val lines2 = sc.textFile("file2")
val rdd = lines1 union lines2
regards,
--Jakob
On 11
it will change.
>> >
>> > Any improvements for the sbt build are of course welcome (it is still
>> used
>> > by many developers), but i would not do anything that increases the
>> burden
>> > of maintaining two build systems.
>> >
>> &g
Hi everyone,
in the process of learning Spark, I wanted to get an overview of the
interaction between all of its sub-projects. I therefore decided to have a
look at the build setup and its dependency management.
Since I am alot more comfortable using sbt than maven, I decided to try to
port the
[repost to mailing list]
I don't know much about packages, but have you heard about the
sbt-spark-package plugin?
Looking at the code, specifically
https://github.com/databricks/sbt-spark-package/blob/master/src/main/scala/sbtsparkpackage/SparkPackagePlugin.scala,
might give you insight on the
the path of the source file defining the event API is
`core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala`
On 13 October 2015 at 16:29, Jakob Odersky <joder...@gmail.com> wrote:
> Hi,
> I came across the spark listener API while checking out possible UI
> extensi
Hi,
I came across the spark listener API while checking out possible UI
extensions recently. I noticed that all events inherit from a sealed trait
`SparkListenerEvent` and that a SparkListener has a corresponding
`onEventXXX(event)` method for every possible event.
Considering that events inherit
Hi everyone,
I am just getting started working on spark and was thinking of a first way
to contribute whilst still trying to wrap my head around the codebase.
Exploring the web UI, I noticed it is a classic request-response website,
requiring manual refresh to get the latest data.
I think it
36 matches
Mail list logo