thanks Daniel, this will certainly help,
regards, Richard
On Tue, Jan 19, 2016 at 6:35 PM, Daniel Imberman
wrote:
> edit 2: filter should be map
>
> val numColumns = separatedInputStrings.map{ case(id, (stateList,
> numStates)) => numStates}.reduce(math.max)
>
> On
The most efficient to determine the number of columns would be to do a
take(1) and split in the driver.
Regards
Sab
On 19-Jan-2016 8:48 pm, "Richard Siebeling" wrote:
> Hi,
>
> what is the most efficient way to split columns and know how many columns
> are created.
>
>
Hi,
what is the most efficient way to split columns and know how many columns
are created.
Here is the current RDD
-
ID STATE
-
1 TX, NY, FL
2 CA, OH
-
This is the preferred output:
-
IDSTATE_1 STATE_2
Hi Richard,
If I understand the question correctly it sounds like you could probably do
this using mapValues (I'm assuming that you want two pieces of information
out of all rows, the states as individual items, and the number of states
in the row)
val separatedInputStrings = input:RDD[(Int,
edit: Mistake in the second code example
val numColumns = separatedInputStrings.filter{ case(id, (stateList,
numStates)) => numStates}.reduce(math.max)
On Tue, Jan 19, 2016 at 8:17 AM Daniel Imberman
wrote:
> Hi Richard,
>
> If I understand the question correctly it
that's true and that's the way we're doing it now but then we're only using
the first row to determine the number of splitted columns.
It could be that in the second (or last) row there are 10 new columns and
we'd like to know that too.
Probably a reduceby operator can be used to do that, but I'm
edit 2: filter should be map
val numColumns = separatedInputStrings.map{ case(id, (stateList,
numStates)) => numStates}.reduce(math.max)
On Tue, Jan 19, 2016 at 8:19 AM Daniel Imberman
wrote:
> edit: Mistake in the second code example
>
> val numColumns =