To close this thread rxin created a broader Jira to handle window functions in Dataframes : https://issues.apache.org/jira/browse/SPARK-7322 Thanks everyone.
Le mer. 29 avr. 2015 à 22:51, Olivier Girardot < o.girar...@lateral-thoughts.com> a écrit : > To give you a broader idea of the current use case, I have a few > transformations (sort and column creations) oriented towards a simple goal. > My data is timestamped and if two lines are identical, that time > difference will have to be more than X days in order to be kept, so there > are a few shifts done but very locally : only -1 or +1. > > FYI regarding JIRA, i created one - > https://issues.apache.org/jira/browse/SPARK-7247 - associated to this > discussion. > @rxin considering, in my use case, the data is sorted beforehand, there > might be a better way - but I guess some shuffle would needed anyway... > > > Le mer. 29 avr. 2015 à 22:34, Evan R. Sparks <evan.spa...@gmail.com> a > écrit : > >> In general there's a tension between ordered data and set-oriented data >> model underlying DataFrames. You can force a total ordering on the data, >> but it may come at a high cost with respect to performance. >> >> It would be good to get a sense of the use case you're trying to support, >> but one suggestion would be to apply I can imagine achieving a similar >> result by applying a datetime.timedelta (in Python terms) to a time >> attribute (your "axis") and then performing join between the base table and >> this derived table to merge the data back together. This type of join could >> then be optimized if the use case is frequent enough to warrant it. >> >> - Evan >> >> On Wed, Apr 29, 2015 at 1:25 PM, Reynold Xin <r...@databricks.com> wrote: >> >>> In this case it's fine to discuss whether this would fit in Spark >>> DataFrames' high level direction before putting it in JIRA. Otherwise we >>> might end up creating a lot of tickets just for querying whether >>> something >>> might be a good idea. >>> >>> About this specific feature -- I'm not sure what it means in general >>> given >>> we don't have axis in Spark DataFrames. But I think it'd probably be good >>> to be able to shift a column by one so we can support the end time / >>> begin >>> time case, although it'd require two passes over the data. >>> >>> >>> >>> On Wed, Apr 29, 2015 at 1:08 PM, Nicholas Chammas < >>> nicholas.cham...@gmail.com> wrote: >>> >>> > I can't comment on the direction of the DataFrame API (that's more for >>> > Reynold or Michael I guess), but I just wanted to point out that the >>> JIRA >>> > would be the recommended way to create a central place for discussing a >>> > feature add like that. >>> > >>> > Nick >>> > >>> > On Wed, Apr 29, 2015 at 3:43 PM Olivier Girardot < >>> > o.girar...@lateral-thoughts.com> wrote: >>> > >>> > > Hi Nicholas, >>> > > yes I've already checked, and I've just created the >>> > > https://issues.apache.org/jira/browse/SPARK-7247 >>> > > I'm not even sure why this would be a good feature to add except the >>> fact >>> > > that some of the data scientists I'm working with are using it, and >>> it >>> > > would be therefore useful for me to translate Pandas code to Spark... >>> > > >>> > > Isn't the goal of Spark Dataframe to allow all the features of >>> Pandas/R >>> > > Dataframe using Spark ? >>> > > >>> > > Regards, >>> > > >>> > > Olivier. >>> > > >>> > > Le mer. 29 avr. 2015 à 21:09, Nicholas Chammas < >>> > nicholas.cham...@gmail.com> >>> > > a écrit : >>> > > >>> > >> You can check JIRA for any existing plans. If there isn't any, then >>> feel >>> > >> free to create a JIRA and make the case there for why this would be >>> a >>> > good >>> > >> feature to add. >>> > >> >>> > >> Nick >>> > >> >>> > >> On Wed, Apr 29, 2015 at 7:30 AM Olivier Girardot < >>> > >> o.girar...@lateral-thoughts.com> wrote: >>> > >> >>> > >>> Hi, >>> > >>> Is there any plan to add the "shift" method from Pandas to Spark >>> > >>> Dataframe, >>> > >>> not that I think it's an easy task... >>> > >>> >>> > >>> c.f. >>> > >>> >>> > >>> >>> > >>> http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.shift.html >>> > >>> >>> > >>> Regards, >>> > >>> >>> > >>> Olivier. >>> > >>> >>> > >> >>> > >>> >> >>