What Davies said is correct, second argument is hadoop's output format.
Hadoop supports many type of output format's and all of them have their own
advantages. Apart from the one specified above,
https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html
hdfs://192.168.10.130:9000/dev/output/test already exists, so you need
to remove it first.
On Tue, Apr 26, 2016 at 5:28 AM, Luke Adolph wrote:
> Hi, all:
> Below is my code:
>
> from pyspark import *
> import re
>
> def getDateByLine(input_str):
> str_pattern =
Hi, all:
Below is my code:
from pyspark import *import re
def getDateByLine(input_str):
str_pattern = '^\d{4}-\d{2}-\d{2}'
pattern = re.compile(str_pattern)
match = pattern.match(input_str)
if match:
return match.group()
else:
return None
file_url =