Here is another way you can achieve that(in Python):
base_df.withColumn("column_name","column_expression_for_new_column")
# To add new row create the data frame containing the new row and do the
unionAll()
base_df.unionAll(new_df)

# Another approach convert to rdd add required fields and convert back to
Dataframe
def update_row(row):
    """Add extra column according to your logic"""
    # Example
    update_row = row + ("Text","number",)
    return row

updated_row_rdd = base_df.map(lambda row: update_row(row))
# Convert back to rdd with giving the schema
updated_df = sql_context.createDataFrame(updated_row_rdd, schema)

# To add extra row create the new data frame with the new row and do the
unionAll
result_df = updated_df.unionAll(new_row_df)


Thanks,
Bijay

On Fri, Mar 11, 2016 at 11:49 AM, Michael Armbrust <mich...@databricks.com>
wrote:

> Or look at explode on DataFrame
>
> On Fri, Mar 11, 2016 at 10:45 AM, Stefan Panayotov <spanayo...@msn.com>
> wrote:
>
>> Hi,
>>
>> I have a problem that requires me to go through the rows in a DataFrame
>> (or possibly through rows in a JSON file) and conditionally add rows
>> depending on a value in one of the columns in each existing row. So, for
>> example if I have:
>>
>>
>> +---+---+---+
>> | _1| _2| _3|
>> +---+---+---+
>> |ID1|100|1.1|
>> |ID2|200|2.2|
>> |ID3|300|3.3|
>> |ID4|400|4.4|
>> +---+---+---+
>>
>> I need to be able to get:
>>
>>
>> +---+---+---+--------------------+---+
>> | _1| _2| _3|                  _4| _5|
>> +---+---+---+--------------------+---+
>> |ID1|100|1.1|ID1 add text or d...| 25|
>> |id11 ..|21 |
>> |id12 ..|22 |
>> |ID2|200|2.2|ID2 add text or d...| 50|
>> |id21 ..|33 |
>> |id22 ..|34 |
>> |id23 ..|35 |
>> |ID3|300|3.3|ID3 add text or d...| 75|
>> |id31 ..|11 |
>> |ID4|400|4.4|ID4 add text or d...|100|
>> |id41 ..|51 |
>> |id42 ..|52 |
>> |id43 ..|53 |
>> |id44 ..|54 |
>> +---+---+---+--------------------+---+
>>
>> How can I achieve this in Spark without doing DF.collect(), which will
>> get everything to the driver and for a big data set I'll get OOM?
>> BTW, I know how to use withColumn() to add new columns to the DataFrame.
>> I need to also add new rows.
>> Any help will be appreciated.
>>
>> Thanks,
>>
>>
>> *Stefan Panayotov, PhD **Home*: 610-355-0919
>> *Cell*: 610-517-5586
>> *email*: spanayo...@msn.com
>> spanayo...@outlook.com
>> spanayo...@comcast.net
>>
>>
>
>

Reply via email to