deh <m...@peridale.co.uk>
Cc: user @spark <user@spark.apache.org>
Subject: Re: Hive REGEXP_REPLACE use or equivalent in Spark
You might be better off using the CSV loader in this case.
https://github.com/databricks/spark-csv
Input:
[csingh ~]$ hadoop fs -cat test.csv
360,10/
ology Ltd, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Peridale Technology Ltd, its subsidiaries nor their
> employees accept any responsibili
expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Technology Ltd, its subsidiaries nor their
> employees accept any responsibility.
>
>
>
>
>
> *From:* Andrew Ehrlich [mailto:and...@aehrlich.
t: 19 February 2016 01:22
To: Mich Talebzadeh <m...@peridale.co.uk>
Cc: User <user@spark.apache.org>
Subject: Re: Hive REGEXP_REPLACE use or equivalent in Spark
Use the scala method .split(",") to split the string into a collection of
strings, and try using .replac
Use the scala method .split(",") to split the string into a collection of
strings, and try using .replaceAll() on the field with the "?" to remove it.
On Thu, Feb 18, 2016 at 2:09 PM, Mich Talebzadeh
wrote:
> Hi,
>
> What is the equivalent of this Hive statement in Spark
>
Hi,
What is the equivalent of this Hive statement in Spark
select "?2,500.00", REGEXP_REPLACE("?2,500.00",'[^\\d\\.]','');
++--+--+
|_c0 | _c1|
++--+--+
| ?2,500.00 | 2500.00 |
++--+--+
Basically I want to get rid of