Re: Pandas' Shift in Dataframe

Olivier Girardot Sat, 02 May 2015 13:08:15 -0700

To close this thread rxin created a broader Jira to handle window functions
in Dataframes : https://issues.apache.org/jira/browse/SPARK-7322
Thanks everyone.


Le mer. 29 avr. 2015 à 22:51, Olivier Girardot <
o.girar...@lateral-thoughts.com> a écrit :

> To give you a broader idea of the current use case, I have a few
> transformations (sort and column creations) oriented towards a simple goal.
> My data is timestamped and if two lines are identical, that time
> difference will have to be more than X days in order to be kept, so there
> are a few shifts done but very locally : only -1 or +1.
>
> FYI regarding JIRA, i created one -
> https://issues.apache.org/jira/browse/SPARK-7247 - associated to this
> discussion.
> @rxin considering, in my use case, the data is sorted beforehand, there
> might be a better way - but I guess some shuffle would needed anyway...
>
>
> Le mer. 29 avr. 2015 à 22:34, Evan R. Sparks <evan.spa...@gmail.com> a
> écrit :
>
>> In general there's a tension between ordered data and set-oriented data
>> model underlying DataFrames. You can force a total ordering on the data,
>> but it may come at a high cost with respect to performance.
>>
>> It would be good to get a sense of the use case you're trying to support,
>> but one suggestion would be to apply I can imagine achieving a similar
>> result by applying a datetime.timedelta (in Python terms) to a time
>> attribute (your "axis") and then performing join between the base table and
>> this derived table to merge the data back together. This type of join could
>> then be optimized if the use case is frequent enough to warrant it.
>>
>> - Evan
>>
>> On Wed, Apr 29, 2015 at 1:25 PM, Reynold Xin <r...@databricks.com> wrote:
>>
>>> In this case it's fine to discuss whether this would fit in Spark
>>> DataFrames' high level direction before putting it in JIRA. Otherwise we
>>> might end up creating a lot of tickets just for querying whether
>>> something
>>> might be a good idea.
>>>
>>> About this specific feature -- I'm not sure what it means in general
>>> given
>>> we don't have axis in Spark DataFrames. But I think it'd probably be good
>>> to be able to shift a column by one so we can support the end time /
>>> begin
>>> time case, although it'd require two passes over the data.
>>>
>>>
>>>
>>> On Wed, Apr 29, 2015 at 1:08 PM, Nicholas Chammas <
>>> nicholas.cham...@gmail.com> wrote:
>>>
>>> > I can't comment on the direction of the DataFrame API (that's more for
>>> > Reynold or Michael I guess), but I just wanted to point out that the
>>> JIRA
>>> > would be the recommended way to create a central place for discussing a
>>> > feature add like that.
>>> >
>>> > Nick
>>> >
>>> > On Wed, Apr 29, 2015 at 3:43 PM Olivier Girardot <
>>> > o.girar...@lateral-thoughts.com> wrote:
>>> >
>>> > > Hi Nicholas,
>>> > > yes I've already checked, and I've just created the
>>> > > https://issues.apache.org/jira/browse/SPARK-7247
>>> > > I'm not even sure why this would be a good feature to add except the
>>> fact
>>> > > that some of the data scientists I'm working with are using it, and
>>> it
>>> > > would be therefore useful for me to translate Pandas code to Spark...
>>> > >
>>> > > Isn't the goal of Spark Dataframe to allow all the features of
>>> Pandas/R
>>> > > Dataframe using Spark ?
>>> > >
>>> > > Regards,
>>> > >
>>> > > Olivier.
>>> > >
>>> > > Le mer. 29 avr. 2015 à 21:09, Nicholas Chammas <
>>> > nicholas.cham...@gmail.com>
>>> > > a écrit :
>>> > >
>>> > >> You can check JIRA for any existing plans. If there isn't any, then
>>> feel
>>> > >> free to create a JIRA and make the case there for why this would be
>>> a
>>> > good
>>> > >> feature to add.
>>> > >>
>>> > >> Nick
>>> > >>
>>> > >> On Wed, Apr 29, 2015 at 7:30 AM Olivier Girardot <
>>> > >> o.girar...@lateral-thoughts.com> wrote:
>>> > >>
>>> > >>> Hi,
>>> > >>> Is there any plan to add the "shift" method from Pandas to Spark
>>> > >>> Dataframe,
>>> > >>> not that I think it's an easy task...
>>> > >>>
>>> > >>> c.f.
>>> > >>>
>>> > >>>
>>> >
>>> http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.shift.html
>>> > >>>
>>> > >>> Regards,
>>> > >>>
>>> > >>> Olivier.
>>> > >>>
>>> > >>
>>> >
>>>
>>
>>

Re: Pandas' Shift in Dataframe

Reply via email to