Re: Python Hadoop Example

2019-06-17 Thread Nascimento, Rodrigo
Wei-Chiu,

I see people using python with Spark (pySpark).

{
  "Name"  : "Rodrigo Nascimento",
  "Title" : "Solutions Architect – Open Ecosystems"
}

From: Wei-Chiu Chuang 
Date: Sunday, June 16, 2019 at 2:01 PM
To: Artem Ervits 
Cc: Mike IT Expert , user 
Subject: Re: Python Hadoop Example

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.


Thanks Artem,
Looks interesting. I honestly didn't know what Hadoop Streaming API is used for.
Here are more references: 
https://hadoop.apache.org/docs/r3.2.0/hadoop-streaming/HadoopStreaming.html

I think it brings to another question: how do we treat Python as a first class 
citizen. Especially for data science use cases, Python is *the* language.
For example, we have Java and C and (in Hadoop 3.2) C++ client for HDFS. But 
Hadoop does not ship a Python client.
I see a number of Python libraries that support webhdfs. It's not clear to me 
how well they perform, and if they support more advanced features like 
encryption/Kerberos.

NFS gateway is a possibility. Fuse-dfs is another option. But we know they 
don't work at scale, and the community seems to lost the steam to improve 
NFS/fuse-dfs.

Thoughts?

On Sun, Jun 16, 2019 at 6:52 AM Artem Ervits 
mailto:artemerv...@gmail.com>> wrote:
https://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
On Sun, Jun 16, 2019, 9:18 AM Mike IT Expert 
mailto:mikeitexp...@gmail.com>> wrote:
Please let me know where I can find a good/simple example of mapreduce Python 
code running on Hadoop. Like tutorial or sth.

Thank you




Re: Python Hadoop Example

2019-06-17 Thread Sebastian Nagel
(more up-to-date references)
 https://mrjob.readthedocs.io/en/stable/
 https://github.com/Yelp/mrjob

On 6/17/19 2:28 PM, Sebastian Nagel wrote:
> You may also have a look at mrjob:
>https://pythonhosted.org/mrjob/
> 
> Seb
> 
> 
> On 6/16/19 7:47 PM, Mike IT Expert wrote:
>> Please let me know where I can find a good/simple example of mapreduce 
>> Python code running on
>> Hadoop. Like tutorial or sth.
>>
>> Thank you
>>
>>
> 


-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Re: Python Hadoop Example

2019-06-17 Thread Sebastian Nagel
You may also have a look at mrjob:
   https://pythonhosted.org/mrjob/

Seb


On 6/16/19 7:47 PM, Mike IT Expert wrote:
> Please let me know where I can find a good/simple example of mapreduce Python 
> code running on
> Hadoop. Like tutorial or sth.
> 
> Thank you
> 
> 


-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Re: Python Hadoop Example

2019-06-16 Thread Hariharan Iyer
Hi Wei-Chiu,

You can look at Dask [1]. It can work with HDFS [2] and integrates well
with YARN as well [3].

1 - https://dask.org
2 - http://docs.dask.org/en/latest/remote-data-services.html
3 - http://yarn.dask.org/en/latest/

Thanks,
Hari


On Sun, 16 Jun 2019, 23:31 Wei-Chiu Chuang,  wrote:

> Thanks Artem,
> Looks interesting. I honestly didn't know what Hadoop Streaming API is
> used for.
> Here are more references:
> https://hadoop.apache.org/docs/r3.2.0/hadoop-streaming/HadoopStreaming.html
>
> I think it brings to another question: how do we treat Python as a first
> class citizen. Especially for data science use cases, Python is *the*
> language.
> For example, we have Java and C and (in Hadoop 3.2) C++ client for HDFS.
> But Hadoop does not ship a Python client.
> I see a number of Python libraries that support webhdfs. It's not clear to
> me how well they perform, and if they support more advanced features like
> encryption/Kerberos.
>
> NFS gateway is a possibility. Fuse-dfs is another option. But we know they
> don't work at scale, and the community seems to lost the steam to improve
> NFS/fuse-dfs.
>
> Thoughts?
>
> On Sun, Jun 16, 2019 at 6:52 AM Artem Ervits 
> wrote:
>
>>
>> https://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
>>
>> On Sun, Jun 16, 2019, 9:18 AM Mike IT Expert 
>> wrote:
>>
>>> Please let me know where I can find a good/simple example of mapreduce
>>> Python code running on Hadoop. Like tutorial or sth.
>>>
>>> Thank you
>>>
>>>
>>>


Re: Python Hadoop Example

2019-06-16 Thread Sunil Jain
Hadoop definitive Guide. You can find some python code in this book.

On Sun, 16 Jun 2019, 18:48 Mike IT Expert,  wrote:

> Please let me know where I can find a good/simple example of mapreduce
> Python code running on Hadoop. Like tutorial or sth.
>
> Thank you
>
>
>


Re: Python Hadoop Example

2019-06-16 Thread Wei-Chiu Chuang
Thanks Artem,
Looks interesting. I honestly didn't know what Hadoop Streaming API is used
for.
Here are more references:
https://hadoop.apache.org/docs/r3.2.0/hadoop-streaming/HadoopStreaming.html

I think it brings to another question: how do we treat Python as a first
class citizen. Especially for data science use cases, Python is *the*
language.
For example, we have Java and C and (in Hadoop 3.2) C++ client for HDFS.
But Hadoop does not ship a Python client.
I see a number of Python libraries that support webhdfs. It's not clear to
me how well they perform, and if they support more advanced features like
encryption/Kerberos.

NFS gateway is a possibility. Fuse-dfs is another option. But we know they
don't work at scale, and the community seems to lost the steam to improve
NFS/fuse-dfs.

Thoughts?

On Sun, Jun 16, 2019 at 6:52 AM Artem Ervits  wrote:

>
> https://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
>
> On Sun, Jun 16, 2019, 9:18 AM Mike IT Expert 
> wrote:
>
>> Please let me know where I can find a good/simple example of mapreduce
>> Python code running on Hadoop. Like tutorial or sth.
>>
>> Thank you
>>
>>
>>


Re: Python Hadoop Example

2019-06-16 Thread Artem Ervits
https://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/

On Sun, Jun 16, 2019, 9:18 AM Mike IT Expert  wrote:

> Please let me know where I can find a good/simple example of mapreduce
> Python code running on Hadoop. Like tutorial or sth.
>
> Thank you
>
>
>