It very much depends on the logic that generates the new rows. Is it
per row (i.e. without context?) then you can just convert to RDD and
perform a map operation on each row.

JavaPairRDD<Object, Iterable<Row>> grouped =
dataFrame.javaRDD().groupBy( group by what you need, probably ID );

return grouped.mapValues(rowsIt -> {
    List<Row> rows = Lists.newArrayList(rowsIt);
    return new list of rows based on you logic.
});

convert back to DataFrame using flatMap and createDataFrame
--
Jan Sterba
https://twitter.com/honzasterba | http://flickr.com/honzasterba |
http://500px.com/honzasterba


On Fri, Mar 11, 2016 at 8:49 PM, Michael Armbrust
<mich...@databricks.com> wrote:
> Or look at explode on DataFrame
>
> On Fri, Mar 11, 2016 at 10:45 AM, Stefan Panayotov <spanayo...@msn.com>
> wrote:
>>
>> Hi,
>>
>> I have a problem that requires me to go through the rows in a DataFrame
>> (or possibly through rows in a JSON file) and conditionally add rows
>> depending on a value in one of the columns in each existing row. So, for
>> example if I have:
>>
>>
>> +---+---+---+
>>
>> | _1| _2| _3|
>> +---+---+---+
>> |ID1|100|1.1|
>> |ID2|200|2.2|
>> |ID3|300|3.3|
>> |ID4|400|4.4|
>> +---+---+---+
>>
>> I need to be able to get:
>>
>>
>> +---+---+---+--------------------+---+
>>
>> | _1| _2| _3|                  _4| _5|
>> +---+---+---+--------------------+---+
>> |ID1|100|1.1|ID1 add text or d...| 25|
>> |id11 ..|21 |
>> |id12 ..|22 |
>> |ID2|200|2.2|ID2 add text or d...| 50|
>> |id21 ..|33 |
>> |id22 ..|34 |
>> |id23 ..|35 |
>> |ID3|300|3.3|ID3 add text or d...| 75|
>> |id31 ..|11 |
>> |ID4|400|4.4|ID4 add text or d...|100|
>> |id41 ..|51 |
>> |id42 ..|52 |
>> |id43 ..|53 |
>> |id44 ..|54 |
>> +---+---+---+--------------------+---+
>>
>> How can I achieve this in Spark without doing DF.collect(), which will get
>> everything to the driver and for a big data set I'll get OOM?
>> BTW, I know how to use withColumn() to add new columns to the DataFrame. I
>> need to also add new rows.
>> Any help will be appreciated.
>>
>> Thanks,
>>
>> Stefan Panayotov, PhD
>> Home: 610-355-0919
>> Cell: 610-517-5586
>> email: spanayo...@msn.com
>> spanayo...@outlook.com
>> spanayo...@comcast.net
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to