5, 2016 at 11:37 PM, Sun, Rui <rui@intel.com> wrote:
>>
>> Vote for option 2.
>>
>> Source compatibility and binary compatibility are very important from
>> user’s perspective.
>>
>> It ‘s unfair for Java developers that they don’t have DataFrame
>> a
t;
>
>
> But obviously Dataset[Row] is not internally Dataset[Row(value: Row)].
>
>
>
> *From:* Reynold Xin [mailto:r...@databricks.com]
> *Sent:* Friday, February 26, 2016 3:55 PM
> *To:* Sun, Rui <rui@intel.com>
> *Cc:* Koert Kuipers <ko...@tresata
nternally Dataset[Row(value: Row)].
From: Reynold Xin [mailto:r...@databricks.com]
Sent: Friday, February 26, 2016 3:55 PM
To: Sun, Rui <rui@intel.com>
Cc: Koert Kuipers <ko...@tresata.com>; dev@spark.apache.org
Subject: Re: [discuss] DataFrame vs Dataset in Spark 2.0
The join and joinWith
ataFrame of Row?
>
>
>
> *From:* Reynold Xin [mailto:r...@databricks.com]
> *Sent:* Friday, February 26, 2016 8:52 AM
> *To:* Koert Kuipers <ko...@tresata.com>
> *Cc:* dev@spark.apache.org
> *Subject:* Re: [discuss] DataFrame vs Dataset in Spark 2.0
>
>
>
>
...@databricks.com]
Sent: Friday, February 26, 2016 8:52 AM
To: Koert Kuipers <ko...@tresata.com>
Cc: dev@spark.apache.org
Subject: Re: [discuss] DataFrame vs Dataset in Spark 2.0
Yes - and that's why source compatibility is broken.
Note that it is not just a "convenience" thing. Concep
Yes - and that's why source compatibility is broken.
Note that it is not just a "convenience" thing. Conceptually DataFrame is a
Dataset[Row], and for some developers it is more natural to think about
"DataFrame" rather than "Dataset[Row]".
If we were in C++, DataFrame would've been a type alias
since a type alias is purely a convenience thing for the scala compiler,
does option 1 mean that the concept of DataFrame ceases to exist from a
java perspective, and they will have to refer to Dataset?
On Thu, Feb 25, 2016 at 6:23 PM, Reynold Xin wrote:
> When we first
It might make sense, but this option seems to carry all the cons of Option
2, and yet doesn't provide compatibility for Java?
On Thu, Feb 25, 2016 at 3:31 PM, Michael Malak
wrote:
> Would it make sense (in terms of feasibility, code organization, and
> politically) to
Would it make sense (in terms of feasibility, code organization, and
politically) to have a JavaDataFrame, as a way to isolate the 1000+ extra lines
to a Java compatibility layer/class?
From: Reynold Xin
To: "dev@spark.apache.org"
Sent:
vote for Option 1.
1) Since 2.0 is major API, we are expecting some API changes,
2) It helps long term code base maintenance with short term pain on Java
side
3) Not quite sure how large the code base is using Java DataFrame APIs.
On Thu, Feb 25, 2016 at 3:23 PM, Reynold Xin
10 matches
Mail list logo