subject:"Large where clause StackOverflow 1.5.2"

Re: Large where clause StackOverflow 1.5.2

2016-08-18 Thread rachmaninovquartet

I solved this by using a Window partitioned by 'id'. I used lead and lag to
create columns, which contained nulls in the places that I needed to delete,
in each fold. I then removed those rows with the nulls and my additional
columns. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Large-where-clause-StackOverflow-1-5-2-tp27544p27559.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Large where clause StackOverflow 1.5.2

2016-08-16 Thread rachmaninovquartet

Hi,

I'm trying to implement a folding function in Spark, it takes an input k and
a data frame of ids and dates. k=1 will be just the data frame, k=2 will,
consist of the min and max date for each id once and the rest twice, k=3
will consist of min and max once, min+1 and max-1, twice and the rest three
times, etc.

Code in scala, with variable names changed:
  val acctMinDates = df.groupBy("id").agg(min("thedate"))
  val acctMaxDates = df.groupBy("id").agg(max("thedate"))
  val acctDates = acctMinDates.join(acctMaxDates, "id").collect()

  var filterString = "";

  for (i <- 1 to k - 1) {

if (i == 1) {
  for (aDate <- acctDates) {
filterString = filterString + "(id = " + aDate(0) + " and thedate >
" + aDate(1) + " and thedate < " + aDate(2) + ") or ";
  }
  filterString = filterString.substring(0, filterString.size - 4)
}
df = df.unionAll(df.where(filterString));
  }
}

Code that is being attempted to translate, from pandas/python:

df = pd.concat([df.groupby('id').apply(lambda x: pd.concat([x.iloc[i: i + k]
for i in range(len(x.index) - k + 1)]))])



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Large-where-clause-StackOverflow-1-5-2-tp27544.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Large where clause StackOverflow 1.5.2

Large where clause StackOverflow 1.5.2

2 matches

Site Navigation

Mail list logo

Footer information