subject:"Split columns in RDD"

Re: Split columns in RDD

2016-01-19 Thread Richard Siebeling

thanks Daniel, this will certainly help, regards, Richard On Tue, Jan 19, 2016 at 6:35 PM, Daniel Imberman wrote: > edit 2: filter should be map > > val numColumns = separatedInputStrings.map{ case(id, (stateList, > numStates)) => numStates}.reduce(math.max) > > On

Re: Split columns in RDD

2016-01-19 Thread Sabarish Sasidharan

The most efficient to determine the number of columns would be to do a take(1) and split in the driver. Regards Sab On 19-Jan-2016 8:48 pm, "Richard Siebeling" wrote: > Hi, > > what is the most efficient way to split columns and know how many columns > are created. > >

Split columns in RDD

2016-01-19 Thread Richard Siebeling

Hi, what is the most efficient way to split columns and know how many columns are created. Here is the current RDD - ID STATE - 1 TX, NY, FL 2 CA, OH - This is the preferred output: - IDSTATE_1 STATE_2

Re: Split columns in RDD

2016-01-19 Thread Daniel Imberman

Hi Richard, If I understand the question correctly it sounds like you could probably do this using mapValues (I'm assuming that you want two pieces of information out of all rows, the states as individual items, and the number of states in the row) val separatedInputStrings = input:RDD[(Int,

Re: Split columns in RDD

2016-01-19 Thread Daniel Imberman

edit: Mistake in the second code example val numColumns = separatedInputStrings.filter{ case(id, (stateList, numStates)) => numStates}.reduce(math.max) On Tue, Jan 19, 2016 at 8:17 AM Daniel Imberman wrote: > Hi Richard, > > If I understand the question correctly it

Re: Split columns in RDD

2016-01-19 Thread Richard Siebeling

that's true and that's the way we're doing it now but then we're only using the first row to determine the number of splitted columns. It could be that in the second (or last) row there are 10 new columns and we'd like to know that too. Probably a reduceby operator can be used to do that, but I'm

Re: Split columns in RDD

2016-01-19 Thread Daniel Imberman

edit 2: filter should be map val numColumns = separatedInputStrings.map{ case(id, (stateList, numStates)) => numStates}.reduce(math.max) On Tue, Jan 19, 2016 at 8:19 AM Daniel Imberman wrote: > edit: Mistake in the second code example > > val numColumns =

Re: Split columns in RDD

Re: Split columns in RDD

Split columns in RDD

Re: Split columns in RDD

Re: Split columns in RDD

Re: Split columns in RDD

Re: Split columns in RDD

7 matches

Site Navigation

Mail list logo

Footer information