Hash codes should try to avoid collisions of objects that are not
equal. Integer overflowing is not an issue by itself
On Wed, Sep 21, 2016 at 10:49 PM, WangJianfei
wrote:
> Than you very much sir! but what i want to know is whether the hashcode
> overflow will
Can you provide more details? It's unclear what you're asking
On Wed, Sep 21, 2016 at 10:14 AM, shashikant.kulka...@gmail.com
wrote:
> Hi All,
>
> I am trying to use the JavaRDD.pipe() API.
>
> I have one object with me from the JavaRDD
t a.hashCode == b.hashCode when
> a.equals(b), the bidirectional case is usually harder to satisfy due to
> possibility of collisions.
>
> Good info:
> http://www.programcreek.com/2011/07/java-equals-and-hashcode-contract/
> _____
> From: Jakob Odersky <
Hi,
It is used jointly with a custom implementation of the `equals`
method. In Scala, you can override the `equals` method to change the
behaviour of `==` comparison. On example of this would be to compare
classes based on their parameter values (i.e. what case classes do).
Partitioners aren't
One option would be to use Apache Toree. A quick setup guide can be
found here https://toree.incubator.apache.org/documentation/user/quick-start
On Wed, Sep 21, 2016 at 2:02 PM, Arif,Mubaraka wrote:
> Has anyone installed the scala kernel for Jupyter notebook.
>
>
>
> Any
Your app is fine, I think the error has to do with the way inttelij
launches applications. Is your app forked in a new jvm when you run it?
On Wed, Sep 21, 2016 at 2:28 PM, Gokula Krishnan D
wrote:
> Hello Sumit -
>
> I could see that SparkConf() specification is not being
Hi Xiang,
this error also appears in client mode (maybe the situation that you
were referring to and that worked was local mode?), however the error
is expected and is not a bug.
this line in your snippet:
object Main extends A[String] { //...
is, after desugaring, equivalent to:
object
[
https://issues.apache.org/jira/browse/SPARK-16264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494587#comment-15494587
]
Jakob Odersky commented on SPARK-16264:
---
I just came across this issue through a comment
Hi Xiaoye,
could it be that the executors were spawned before the affinity was
set on the worker? Would it help to start spark worker with taskset
from the beginning, i.e. "taskset [mask] start-slave.sh"?
Workers in spark (standalone mode) simply create processes with the
standard java process
There are some flaky tests that occasionally fail, my first
recommendation would be to re-run the test suite. Another thing to
check is if there are any applications listening to spark's default
ports.
Btw, what is your environment like? In case it is windows, I don't
think tests are regularly run
[
https://issues.apache.org/jira/browse/SPARK-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478428#comment-15478428
]
Jakob Odersky commented on SPARK-14221:
---
I just saw that chill already [has a pending PR to upgrade
st use-case of Spark though and will probably be a
performance bottleneck.
On Fri, Sep 9, 2016 at 11:45 AM, Jakob Odersky <ja...@odersky.com> wrote:
> Hi Sujeet,
>
> going sequentially over all parallel, distributed data seems like a
> counter-productive thing to do. What are you
Hi Sujeet,
going sequentially over all parallel, distributed data seems like a
counter-productive thing to do. What are you trying to accomplish?
regards,
--Jakob
On Fri, Sep 9, 2016 at 3:29 AM, sujeet jog wrote:
> Hi,
> Is there a way to iterate over a DataFrame with n
[
https://issues.apache.org/jira/browse/SPARK-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477903#comment-15477903
]
Jakob Odersky edited comment on SPARK-14221 at 9/9/16 6:30 PM:
---
[~joshrosen
[
https://issues.apache.org/jira/browse/SPARK-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477903#comment-15477903
]
Jakob Odersky commented on SPARK-14221:
---
[~joshrosen]'s upstream PR requires Kryo 3.1, a version
+1 to Sean's answer, importing varargs.
In this case the _root_ is also unnecessary (it would be required in
case you were using it in a nested package called "scala" itself)
On Thu, Sep 8, 2016 at 9:27 AM, Sean Owen wrote:
> I think the @_root_ version is redundant because
>
(Maybe unrelated FYI): in case you're using only Scala or Java with
Spark, I would recommend to use Datasets instead of DataFrames. They
provide exactly the same functionality, yet offer more type-safety.
On Thu, Sep 8, 2016 at 11:05 AM, Lee Becker wrote:
>
> On Thu, Sep
[
https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471428#comment-15471428
]
Jakob Odersky commented on SPARK-17368:
---
Hmm, you're right my assumption was of using only value
[
https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468833#comment-15468833
]
Jakob Odersky edited comment on SPARK-17368 at 9/6/16 10:57 PM:
So I
[
https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468833#comment-15468833
]
Jakob Odersky commented on SPARK-17368:
---
So I thought about this a bit more and although
[
https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459707#comment-15459707
]
Jakob Odersky commented on SPARK-17368:
---
Yeah macros would be awesome, something with Scala.meta
[
https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459587#comment-15459587
]
Jakob Odersky commented on SPARK-17368:
---
I'm currently taking a look at this but my first analysis
Hi Dayne,
you can look at this page for some starter issues:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20labels%20%3D%20Starter%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened).
Also check out this guide on how to contribute to Spark
Spark currently requires at least Java 1.7, so adding a Java
1.8-specific encoder will not be straightforward without affecting
requirements. I can think of two solutions:
1. add a Java 1.8 build profile which includes such encoders (this may
be useful for Scala 2.12 support in the future as
[
https://issues.apache.org/jira/browse/SPARK-17367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15457862#comment-15457862
]
Jakob Odersky commented on SPARK-17367:
---
You're absolutely correct, it is a Scala issue. I raised
Forgot to answer your question about feature parity of Python w.r.t.
Spark's different components
I mostly work with scala so I can't say for sure but I think that all
pre-2.0 features (that's basically everything except Structured Streaming)
are on par. Structured Streaming is a pretty new
As you point out, often the reason that Python support lags behind is that
functionality is implemented in Scala, so the API in that language is
"free" whereas Python support needs to be added explicitly. Nevertheless,
Python bindings are an important part of Spark and is used by many people
(this
[
https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456966#comment-15456966
]
Jakob Odersky edited comment on SPARK-17368 at 9/1/16 11:48 PM:
FYI
[
https://issues.apache.org/jira/browse/SPARK-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456966#comment-15456966
]
Jakob Odersky commented on SPARK-17368:
---
FYI the issue also occurs for top-level value classes (i.e
I'm not sure how the shepherd thing works, but just FYI Michael
Armbrust originally wrote Catalyst, the engine behind Datasets.
You can find a list of all committers here
https://cwiki.apache.org/confluence/display/SPARK/Committers. Another
good resource is to check https://spark-prs.appspot.com/
w>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>>
> However, what really worries me is not having Dataset APIs at all in
Python. I think thats a deal breaker.
What is the functionality you are missing? In Spark 2.0 a DataFrame is just
an alias for Dataset[Row] ("type DataFrame = Dataset[Row]" in
core/.../o/a/s/sql/package.scala).
Since python is
Hi Aris,
thanks for sharing this issue. I can confirm that value classes
currently don't work, however I can't think of reason why they
shouldn't be supported. I would therefore recommend that you report
this as a bug.
(Btw, value classes also currently aren't definable in the REPL. See
Jakob Odersky created SPARK-17367:
-
Summary: Cannot define value classes in REPL
Key: SPARK-17367
URL: https://issues.apache.org/jira/browse/SPARK-17367
Project: Spark
Issue Type: Bug
Implementing custom encoders is unfortunately not well supported at
the moment (IIRC there are plans to eventually add an api for user
defined encoders).
That being said, there are a couple of encoders that can work with
generic, serializable data types: "javaSerialization" and "kryo",
found here
[
https://issues.apache.org/jira/browse/SPARK-17103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424228#comment-15424228
]
Jakob Odersky edited comment on SPARK-17103 at 8/17/16 5:28 PM:
That's
[
https://issues.apache.org/jira/browse/SPARK-17103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424228#comment-15424228
]
Jakob Odersky edited comment on SPARK-17103 at 8/17/16 9:53 AM:
That's
[
https://issues.apache.org/jira/browse/SPARK-17103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424228#comment-15424228
]
Jakob Odersky edited comment on SPARK-17103 at 8/17/16 9:52 AM:
That's
[
https://issues.apache.org/jira/browse/SPARK-17103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424228#comment-15424228
]
Jakob Odersky commented on SPARK-17103:
---
That's true, the spark repl is basically just a thin
[
https://issues.apache.org/jira/browse/SPARK-17095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423602#comment-15423602
]
Jakob Odersky commented on SPARK-17095:
---
Since this bug also occurs when there are no opening
Does the file /home/user/spark-1.5.1-bin-hadoop2.4/bin/README.md exist?
On Tue, Jul 19, 2016 at 4:30 AM, RK Spark wrote:
> val textFile = sc.textFile("README.md")val linesWithSpark =
> textFile.filter(line => line.contains("Spark"))
>
Hi Eli,
to build spark, just run
build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests package
in your source directory, where package is the actual word "package".
This will recompile the whole project, so it may take a while when
running the first time.
Replacing a single file
[
https://issues.apache.org/jira/browse/SPARK-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299168#comment-15299168
]
Jakob Odersky commented on SPARK-15014:
---
You might still have some issues with classloaders, I
[
https://issues.apache.org/jira/browse/SPARK-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299118#comment-15299118
]
Jakob Odersky commented on SPARK-15014:
---
spark-shell is a very thin wrapper around the standard
Spark actually used to depend on Akka. Unfortunately this brought in
all of Akka's dependencies (in addition to Spark's already quite
complex dependency graph) and, as Todd mentioned, led to conflicts
with projects using both Spark and Akka.
It would probably be possible to use Akka and shade it
implemented.
>
> However, even on generating the file under the default resourceDirectory =>
> core/src/resources doesn't pick the file in jar after doing a clean. So this
> seems to be a different issue.
>
>
>
>
>
> On Thu, May 19, 2016 at 4:17 PM, Jakob Oders
To echo my comment on the PR: I think the "sbt way" to add extra,
generated resources to the classpath is by adding a new task to the
`resourceGenerators` setting. Also, the task should output any files
into the directory specified by the `resourceManaged` setting. See
[
https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289738#comment-15289738
]
Jakob Odersky commented on SPARK-13581:
---
I can't reproduce it anymore either
> LibSVM thr
[
https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289738#comment-15289738
]
Jakob Odersky edited comment on SPARK-13581 at 5/18/16 8:26 PM:
I can't
[
https://issues.apache.org/jira/browse/SPARK-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259100#comment-15259100
]
Jakob Odersky commented on SPARK-14519:
---
That sounds reasonable, however should the parent JIRA
[
https://issues.apache.org/jira/browse/SPARK-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259079#comment-15259079
]
Jakob Odersky commented on SPARK-14146:
---
the reason this fails is because spark-shell sets
[
https://issues.apache.org/jira/browse/SPARK-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258974#comment-15258974
]
Jakob Odersky commented on SPARK-14519:
---
>From a reply in the mailing list archive (14/4/2
[
https://issues.apache.org/jira/browse/SPARK-14417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258883#comment-15258883
]
Jakob Odersky commented on SPARK-14417:
---
I suggested that Arun add the JIRA in the title and close
[
https://issues.apache.org/jira/browse/SPARK-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258702#comment-15258702
]
Jakob Odersky commented on SPARK-14511:
---
release is out, pr has been submitted
> Publish
[
https://issues.apache.org/jira/browse/SPARK-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257217#comment-15257217
]
Jakob Odersky commented on SPARK-14511:
---
Update: an issue was discovered during release-testing
[
https://issues.apache.org/jira/browse/SPARK-10001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251304#comment-15251304
]
Jakob Odersky commented on SPARK-10001:
---
FYI, I took up the issue (previous pr #8216)
> Allow C
[
https://issues.apache.org/jira/browse/SPARK-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246336#comment-15246336
]
Jakob Odersky commented on SPARK-14511:
---
cf https://github.com/typesafehub/genjavadoc/issues/73
I
[
https://issues.apache.org/jira/browse/SPARK-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242725#comment-15242725
]
Jakob Odersky commented on SPARK-7992:
--
[~mengxr], The PR is finally in! Let's hope upstream makes
to unsubscribe, send an email to user-unsubscr...@spark.apache.org
On Tue, Apr 5, 2016 at 4:50 PM, Ranjana Rajendran
wrote:
> I get to see the threads in the public mailing list. I don;t want so many
> messages in my inbox. I want to unsubscribe.
I just found out how the hash is calculated:
gpg --print-md sha512 .tgz
you can use that to check if the resulting output matches the contents
of .tgz.sha
On Mon, Apr 4, 2016 at 3:19 PM, Jakob Odersky <ja...@odersky.com> wrote:
> The published hash is a SHA512.
>
> You can verif
Is someone going to retry fixing these packages? It's still a problem.
>>>>
>>>> Also, it would be good to understand why this is happening.
>>>>
>>>> On Fri, Mar 18, 2016 at 6:49 PM Jakob Odersky <ja...@odersky.com> wrote:
>>
[
https://issues.apache.org/jira/browse/SPARK-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215315#comment-15215315
]
Jakob Odersky commented on SPARK-7992:
--
[~mengxr], I just submitted [another
PR|https://github.com
I mean from the perspective of someone developing Spark, it makes
things more complicated. It's just my point of view, people that
actually support Spark deployments may have a different opinion ;)
On Thu, Mar 24, 2016 at 2:41 PM, Jakob Odersky <ja...@odersky.com> wrote:
> You can, but s
You can, but since it's going to be a maintainability issue I would
argue it is in fact a problem.
On Thu, Mar 24, 2016 at 2:34 PM, Marcelo Vanzin <van...@cloudera.com> wrote:
> Hi Jakob,
>
> On Thu, Mar 24, 2016 at 2:29 PM, Jakob Odersky <ja...@odersky.com> wrote:
&
Reynold's 3rd point is particularly strong in my opinion. Supporting
Scala 2.12 will require Java 8 anyway, and introducing such a change
is probably best done in a major release.
Consider what would happen if Spark 2.0 doesn't require Java 8 and
hence not support Scala 2.12. Will it be stuck on
[
https://issues.apache.org/jira/browse/SPARK-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209280#comment-15209280
]
Jakob Odersky edited comment on SPARK-7992 at 3/23/16 10:16 PM:
Hey
[
https://issues.apache.org/jira/browse/SPARK-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209280#comment-15209280
]
Jakob Odersky commented on SPARK-7992:
--
Hey Xiangrui,
you caught me in a very busy time last week
Another gotcha to watch out for are the SPARK_* environment variables.
Have you exported SPARK_HOME? In that case, 'spark-shell' will use
Spark from the variable, regardless of the place the script is called
from.
I.e. if SPARK_HOME points to a release version of Spark, your code
changes will
Can you share a snippet that reproduces the error? What was
spark.sql.autoBroadcastJoinThreshold before your last change?
On Thu, Mar 17, 2016 at 10:03 AM, Jiří Syrový wrote:
> Hi,
>
> any idea what could be causing this issue? It started appearing after
> changing
The error is very strange indeed, however without code that reproduces
it, we can't really provide much help beyond speculation.
One thing that stood out to me immediately is that you say you have an
RDD of Any where every Any should be a BigDecimal, so why not specify
that type information?
When
Doesn't FileInputFormat require type parameters? Like so:
class RawDataInputFormat[LW <: LongWritable, RD <: RDRawDataRecord]
extends FileInputFormat[LW, RD]
I haven't verified this but it could be related to the compile error
you're getting.
On Thu, Mar 17, 2016 at 9:53 AM, Benyi Wang
Hi,
regarding 1, packages are resolved locally. That means that when you
specify a package, spark-submit will resolve the dependencies and
download any jars on the local machine, before shipping* them to the
cluster. So, without a priori knowledge of dataproc clusters, it
should be no different to
line of spark-submit or pyspark. See
>> http://spark.apache.org/docs/latest/submitting-applications.html
>>
>> _
>> From: Jakob Odersky <ja...@odersky.com>
>> Sent: Thursday, March 17, 2016 6:40 PM
>> Subject: Re: installing pa
[
https://issues.apache.org/jira/browse/SPARK-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197877#comment-15197877
]
Jakob Odersky commented on SPARK-7992:
--
I'll check it out
> Hide private classes/obje
I just experienced the issue, however retrying the download a second
time worked. Could it be that there is some load balancer/cache in
front of the archive and some nodes still serve the corrupt packages?
On Fri, Mar 18, 2016 at 8:00 AM, Nicholas Chammas
wrote:
> I'm
com> wrote:
> I just retried the Spark 1.6.1 / Hadoop 2.6 download and got a corrupt ZIP
> file.
>
> Jakob, are you sure the ZIP unpacks correctly for you? Is it the same Spark
> 1.6.1/Hadoop 2.6 package you had a success with?
>
> On Fri, Mar 18, 2016 at 6:11 PM Jakob Odersk
Jakob Odersky created SPARK-13929:
-
Summary: Use Scala reflection for UDFs
Key: SPARK-13929
URL: https://issues.apache.org/jira/browse/SPARK-13929
Project: Spark
Issue Type: Bug
[
https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196928#comment-15196928
]
Jakob Odersky commented on SPARK-13118:
---
Upate: there was actually with inner classes (or package
k
>>>> spark-sql_2.10
>>>> 1.5.1
>>>>
>>>>
>>>>
>>>> [DEBUG] endProcessChildren: artifact=spark:scala:jar:1.0
>>>> [INFO]
>>>>
>&
[
https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196463#comment-15196463
]
Jakob Odersky commented on SPARK-13118:
---
Should I remove the JIRA ID from my existing PR
Hi Mich,
probably unrelated to the current error you're seeing, however the
following dependencies will bite you later:
spark-hive_2.10
spark-csv_2.11
the problem here is that you're using libraries built for different
Scala binary versions (the numbers after the underscore). The simple
fix here
[
https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194465#comment-15194465
]
Jakob Odersky commented on SPARK-13118:
---
Sure, I'll submit a PR with the test
> Supp
Have you tried setting the configuration
`spark.executor.extraLibraryPath` to point to a location where your
.so's are available? (Not sure if non-local files, such as HDFS, are
supported)
On Mon, Mar 14, 2016 at 2:12 PM, Tristan Nixon wrote:
> What build system are you
[
https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194163#comment-15194163
]
Jakob Odersky commented on SPARK-13118:
---
[~marmbrus], what's the issue at hand? Creating a simple
[
https://issues.apache.org/jira/browse/SPARK-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193968#comment-15193968
]
Jakob Odersky commented on SPARK-13118:
---
If I recall correctly, I couldn't reproduce the issue
regarding my previous message, I forgot to mention to run netstat as
root (sudo netstat -plunt)
sorry for the noise
On Fri, Mar 11, 2016 at 12:29 AM, Jakob Odersky <ja...@odersky.com> wrote:
> Some more diagnostics/suggestions:
>
> 1) are other services listening to ports in the
ommand env|grep SPARK; nothing comes back
>>>>
>>>> Tried env|grep Spark; which is the directory I created for Spark once I
>>>> downloaded the tgz file; comes back with PWD=/Users/aidatefera/Spark
>>>>
>>>> Tried running ./bin/spark-shell ; come
Sorry had a typo in my previous message:
> try running just "/bin/spark-shell"
please remove the leading slash (/)
On Wed, Mar 9, 2016 at 1:39 PM, Aida Tefera wrote:
> Hi there, tried echo $SPARK_HOME but nothing comes back so I guess I need to
> set it. How would I do
As Tristan mentioned, it looks as though Spark is trying to bind on
port 0 and then 1 (which is not allowed). Could it be that some
environment variables from you previous installation attempts are
polluting your configuration?
What does running "env | grep SPARK" show you?
Also, try running just
I've had some issues myself with the user-provided-Hadoop version.
If you simply just want to get started, I would recommend downloading
Spark (pre-built, with any of the hadoop versions) as Cody suggested.
A simple step-by-step guide:
1. curl
[
https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jakob Odersky updated SPARK-13581:
--
Description:
When running an action on a DataFrame obtained by reading from a libsvm file
[
https://issues.apache.org/jira/browse/SPARK-13581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173058#comment-15173058
]
Jakob Odersky commented on SPARK-13581:
---
It's in spark "data/mllib/sample_libsvm_dat
Jakob Odersky created SPARK-13581:
-
Summary: LibSVM throws MatchError
Key: SPARK-13581
URL: https://issues.apache.org/jira/browse/SPARK-13581
Project: Spark
Issue Type: Bug
I would recommend (non-binding) option 1.
Apart from the API breakage I can see only advantages, and that sole
disadvantage is minimal for a few reasons:
1. the DataFrame API has been "Experimental" since its implementation,
so no stability was ever implied
2. considering that the change is for
[
https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jakob Odersky reopened SPARK-7768:
--
> Make user-defined type (UDT) API pub
[
https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jakob Odersky closed SPARK-7768.
Resolution: Fixed
> Make user-defined type (UDT) API pub
[
https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168155#comment-15168155
]
Jakob Odersky commented on SPARK-7768:
--
[~marmbrus]
UDTs are public now (in Scala at least), can
[
https://issues.apache.org/jira/browse/SPARK-12878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160119#comment-15160119
]
Jakob Odersky edited comment on SPARK-12878 at 2/25/16 10:22 PM:
-
I just
[
https://issues.apache.org/jira/browse/SPARK-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167911#comment-15167911
]
Jakob Odersky commented on SPARK-10712:
---
Any news on this? Is it still an issue?
> JVM cras
Hi Guillermo,
assuming that the first "a,b" is a typo and you actually meant "a,d",
this is a sorting problem.
You could easily model your data as an RDD or tuples (or as a
dataframe/set) and use the sortBy (or orderBy for dataframe/sets)
methods.
best,
--Jakob
On Wed, Feb 24, 2016 at 2:26 PM,
101 - 200 of 338 matches
Mail list logo