The join and joinWith are just two different join semantics, and is not
about Dataset vs DataFrame.
join is the relational join, where fields are flattened; joinWith is more
like a tuple join, where the output has two fields that are nested.
So you can do
Dataset[A] joinWith Dataset[B] =
Vote for option 2.
Source compatibility and binary compatibility are very important from user’s
perspective.
It ‘s unfair for Java developers that they don’t have DataFrame abstraction. As
you said, sometimes it is more natural to think about DataFrame.
I am wondering if conceptually there is
Yes - and that's why source compatibility is broken.
Note that it is not just a "convenience" thing. Conceptually DataFrame is a
Dataset[Row], and for some developers it is more natural to think about
"DataFrame" rather than "Dataset[Row]".
If we were in C++, DataFrame would've been a type alias
since a type alias is purely a convenience thing for the scala compiler,
does option 1 mean that the concept of DataFrame ceases to exist from a
java perspective, and they will have to refer to Dataset?
On Thu, Feb 25, 2016 at 6:23 PM, Reynold Xin wrote:
> When we first
It might make sense, but this option seems to carry all the cons of Option
2, and yet doesn't provide compatibility for Java?
On Thu, Feb 25, 2016 at 3:31 PM, Michael Malak
wrote:
> Would it make sense (in terms of feasibility, code organization, and
> politically) to
Would it make sense (in terms of feasibility, code organization, and
politically) to have a JavaDataFrame, as a way to isolate the 1000+ extra lines
to a Java compatibility layer/class?
From: Reynold Xin
To: "dev@spark.apache.org"
Sent:
vote for Option 1.
1) Since 2.0 is major API, we are expecting some API changes,
2) It helps long term code base maintenance with short term pain on Java
side
3) Not quite sure how large the code base is using Java DataFrame APIs.
On Thu, Feb 25, 2016 at 3:23 PM, Reynold Xin
When we first introduced Dataset in 1.6 as an experimental API, we wanted
to merge Dataset/DataFrame but couldn't because we didn't want to break the
pre-existing DataFrame API (e.g. map function should return Dataset, rather
than RDD). In Spark 2.0, one of the main API changes is to merge
Thank you, your version of the mvn invocation (as opposed to mine bare "mvn
eclipse:eclipse") worked perfectly.
On Thu, Feb 25, 2016 at 3:22 PM, Yin Yang wrote:
> In yarn/.classpath , I see:
>
>
> Here is the command I used:
>
> build/mvn clean -Phive -Phive-thriftserver
alright, the update is done and worker-08 rebooted. we're back up and
building already!
On Thu, Feb 25, 2016 at 8:15 AM, shane knapp wrote:
> this is happening now.
>
> On Wed, Feb 24, 2016 at 6:08 PM, shane knapp wrote:
>> the security update has been
this is happening now.
On Wed, Feb 24, 2016 at 6:08 PM, shane knapp wrote:
> the security update has been released, and it's a doozy!
>
> https://wiki.jenkins-ci.org/display/SECURITY/Security+Advisory+2016-02-24
>
> i will be putting jenkins in to quiet mode ~7am PST
In yarn/.classpath , I see:
Here is the command I used:
build/mvn clean -Phive -Phive-thriftserver -Pyarn -Phadoop-2.6
-Dhadoop.version=2.7.0 package -DskipTests eclipse:eclipse
FYI
On Thu, Feb 25, 2016 at 6:13 AM, Łukasz Gieroń wrote:
> I've just checked, and "mvn
well, I am using IDEA to import the code base.
At 2016-02-25 22:13:11, "Łukasz Gieroń" wrote:
I've just checked, and "mvn eclipse:eclipse" generates incorrect projects as
well.
On Thu, Feb 25, 2016 at 3:04 PM, Allen Zhang wrote:
why not use
I've just checked, and "mvn eclipse:eclipse" generates incorrect projects
as well.
On Thu, Feb 25, 2016 at 3:04 PM, Allen Zhang wrote:
> why not use maven
>
>
>
>
>
>
> At 2016-02-25 21:55:49, "lgieron" wrote:
> >The Spark projects generated by sbt
dev/change-scala-version 2.10 may help you?
At 2016-02-25 21:55:49, "lgieron" wrote:
>The Spark projects generated by sbt eclipse plugin have incorrect dependent
>projects (as visible on Properties -> Java Build Path -> Projects tab). All
>dependent project are missing
why not use maven
At 2016-02-25 21:55:49, "lgieron" wrote:
>The Spark projects generated by sbt eclipse plugin have incorrect dependent
>projects (as visible on Properties -> Java Build Path -> Projects tab). All
>dependent project are missing the "_2.11" suffix (for
The Spark projects generated by sbt eclipse plugin have incorrect dependent
projects (as visible on Properties -> Java Build Path -> Projects tab). All
dependent project are missing the "_2.11" suffix (for example, it's
"spark-core" instead of correct "spark-core_2.11"). This of course causes
the
Hi,
I am debugging a situation where SortShuffleWriter sometimes fail to
create a file, with the following stack trace:
16/02/23 11:48:46 ERROR Executor: Exception in task 13.0 in stage
47827.0 (TID 1367089)
java.io.FileNotFoundException:
18 matches
Mail list logo