Hi there,

I'm experimenting with Spark ml classification in Python and would like to
raise some questions

We have input data with format like

{label: "string", field1: "string", field2: "string", field3:
"array[string]"}

The idea is to build the text field with specified combinations on these
parameters by concatenation before feeding it to tokeniser and tf-idf

e.g.:  3 X field1 + field2 + concat(field3)

I can easily preprocess array before using it in Spark and turn field3 into
single string, but wondering if there is value on having function in Spark
for doing it

Maybe worth have some of array ops like
http://www.postgresql.org/docs/9.5/static/functions-array.html#ARRAY-FUNCTIONS-TABLE
?

If so, I can probably help implementing it with a little guidance from
somebody

Thanks,
Viktor

Reply via email to