Re: Python Hadoop Example
Wei-Chiu, I see people using python with Spark (pySpark). { "Name" : "Rodrigo Nascimento", "Title" : "Solutions Architect – Open Ecosystems" } From: Wei-Chiu Chuang Date: Sunday, June 16, 2019 at 2:01 PM To: Artem Ervits Cc: Mike IT Expert , user Subject: Re: Python Hadoop Example NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe. Thanks Artem, Looks interesting. I honestly didn't know what Hadoop Streaming API is used for. Here are more references: https://hadoop.apache.org/docs/r3.2.0/hadoop-streaming/HadoopStreaming.html I think it brings to another question: how do we treat Python as a first class citizen. Especially for data science use cases, Python is *the* language. For example, we have Java and C and (in Hadoop 3.2) C++ client for HDFS. But Hadoop does not ship a Python client. I see a number of Python libraries that support webhdfs. It's not clear to me how well they perform, and if they support more advanced features like encryption/Kerberos. NFS gateway is a possibility. Fuse-dfs is another option. But we know they don't work at scale, and the community seems to lost the steam to improve NFS/fuse-dfs. Thoughts? On Sun, Jun 16, 2019 at 6:52 AM Artem Ervits mailto:artemerv...@gmail.com>> wrote: https://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ On Sun, Jun 16, 2019, 9:18 AM Mike IT Expert mailto:mikeitexp...@gmail.com>> wrote: Please let me know where I can find a good/simple example of mapreduce Python code running on Hadoop. Like tutorial or sth. Thank you
Re: Python Hadoop Example
(more up-to-date references) https://mrjob.readthedocs.io/en/stable/ https://github.com/Yelp/mrjob On 6/17/19 2:28 PM, Sebastian Nagel wrote: > You may also have a look at mrjob: >https://pythonhosted.org/mrjob/ > > Seb > > > On 6/16/19 7:47 PM, Mike IT Expert wrote: >> Please let me know where I can find a good/simple example of mapreduce >> Python code running on >> Hadoop. Like tutorial or sth. >> >> Thank you >> >> > - To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional commands, e-mail: user-h...@hadoop.apache.org
Re: Python Hadoop Example
You may also have a look at mrjob: https://pythonhosted.org/mrjob/ Seb On 6/16/19 7:47 PM, Mike IT Expert wrote: > Please let me know where I can find a good/simple example of mapreduce Python > code running on > Hadoop. Like tutorial or sth. > > Thank you > > - To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org For additional commands, e-mail: user-h...@hadoop.apache.org
Re: Python Hadoop Example
Hi Wei-Chiu, You can look at Dask [1]. It can work with HDFS [2] and integrates well with YARN as well [3]. 1 - https://dask.org 2 - http://docs.dask.org/en/latest/remote-data-services.html 3 - http://yarn.dask.org/en/latest/ Thanks, Hari On Sun, 16 Jun 2019, 23:31 Wei-Chiu Chuang, wrote: > Thanks Artem, > Looks interesting. I honestly didn't know what Hadoop Streaming API is > used for. > Here are more references: > https://hadoop.apache.org/docs/r3.2.0/hadoop-streaming/HadoopStreaming.html > > I think it brings to another question: how do we treat Python as a first > class citizen. Especially for data science use cases, Python is *the* > language. > For example, we have Java and C and (in Hadoop 3.2) C++ client for HDFS. > But Hadoop does not ship a Python client. > I see a number of Python libraries that support webhdfs. It's not clear to > me how well they perform, and if they support more advanced features like > encryption/Kerberos. > > NFS gateway is a possibility. Fuse-dfs is another option. But we know they > don't work at scale, and the community seems to lost the steam to improve > NFS/fuse-dfs. > > Thoughts? > > On Sun, Jun 16, 2019 at 6:52 AM Artem Ervits > wrote: > >> >> https://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ >> >> On Sun, Jun 16, 2019, 9:18 AM Mike IT Expert >> wrote: >> >>> Please let me know where I can find a good/simple example of mapreduce >>> Python code running on Hadoop. Like tutorial or sth. >>> >>> Thank you >>> >>> >>>
Re: Python Hadoop Example
Hadoop definitive Guide. You can find some python code in this book. On Sun, 16 Jun 2019, 18:48 Mike IT Expert, wrote: > Please let me know where I can find a good/simple example of mapreduce > Python code running on Hadoop. Like tutorial or sth. > > Thank you > > >
Re: Python Hadoop Example
Thanks Artem, Looks interesting. I honestly didn't know what Hadoop Streaming API is used for. Here are more references: https://hadoop.apache.org/docs/r3.2.0/hadoop-streaming/HadoopStreaming.html I think it brings to another question: how do we treat Python as a first class citizen. Especially for data science use cases, Python is *the* language. For example, we have Java and C and (in Hadoop 3.2) C++ client for HDFS. But Hadoop does not ship a Python client. I see a number of Python libraries that support webhdfs. It's not clear to me how well they perform, and if they support more advanced features like encryption/Kerberos. NFS gateway is a possibility. Fuse-dfs is another option. But we know they don't work at scale, and the community seems to lost the steam to improve NFS/fuse-dfs. Thoughts? On Sun, Jun 16, 2019 at 6:52 AM Artem Ervits wrote: > > https://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ > > On Sun, Jun 16, 2019, 9:18 AM Mike IT Expert > wrote: > >> Please let me know where I can find a good/simple example of mapreduce >> Python code running on Hadoop. Like tutorial or sth. >> >> Thank you >> >> >>
Re: Python Hadoop Example
https://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ On Sun, Jun 16, 2019, 9:18 AM Mike IT Expert wrote: > Please let me know where I can find a good/simple example of mapreduce > Python code running on Hadoop. Like tutorial or sth. > > Thank you > > >