Eyal Allweil created DATAFU-154: ----------------------------------- Summary: Spark Explode Array method Key: DATAFU-154 URL: https://issues.apache.org/jira/browse/DATAFU-154 Project: DataFu Issue Type: New Feature Environment: {noformat} *no* further _formatting_ is done here{noformat} {code:java} {code} {code:java} {code} Reporter: Eyal Allweil
Spark has an [explode|https://spark.apache.org/docs/latest/api/sql/index.html#explode] method that divides an array into multiple rows. It is sometimes convenient to divide an array into multiple columns. For example, for input +-----+----------------------------------------+ |label|sentence_arr | +-----+----------------------------------------+ |0.0 |[Hi, I heard, about, Spark] | |0.0 |[I wish, Java, could use, case classes] | |1.0 |[Logistic, regression, models, are neat]| +-----+----------------------------------------+ the output could be +-----+----------------------------------------+--------+----------+---------+------------+ |label|sentence_arr |token0 |token1 |token2 |token3 | +-----+----------------------------------------+--------+----------+---------+------------+ |0.0 |[Hi, I heard, about, Spark] |Hi |I heard |about |Spark | |0.0 |[I wish, Java, could use, case classes] |I wish |Java |could use|case classes| |1.0 |[Logistic, regression, models, are neat]|Logistic|regression|models |are neat | +-----+----------------------------------------+--------+----------+---------+------------+ -- This message was sent by Atlassian Jira (v8.3.4#803005)