Re: Python Hadoop Example
Wei-Chiu, I see people using python with Spark (pySpark). { "Name" : "Rodrigo Nascimento", "Title" : "Solutions Architect – Open Ecosystems" } From: Wei-Chiu Chuang Date: Sunday, June 16, 2019 at 2:01 PM To: Artem Ervits Cc: Mike IT Expert , user Subject: Re: Python Hadoop Example NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe. Thanks Artem, Looks interesting. I honestly didn't know what Hadoop Streaming API is used for. Here are more references: https://hadoop.apache.org/docs/r3.2.0/hadoop-streaming/HadoopStreaming.html I think it brings to another question: how do we treat Python as a first class citizen. Especially for data science use cases, Python is *the* language. For example, we have Java and C and (in Hadoop 3.2) C++ client for HDFS. But Hadoop does not ship a Python client. I see a number of Python libraries that support webhdfs. It's not clear to me how well they perform, and if they support more advanced features like encryption/Kerberos. NFS gateway is a possibility. Fuse-dfs is another option. But we know they don't work at scale, and the community seems to lost the steam to improve NFS/fuse-dfs. Thoughts? On Sun, Jun 16, 2019 at 6:52 AM Artem Ervits mailto:artemerv...@gmail.com>> wrote: https://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ On Sun, Jun 16, 2019, 9:18 AM Mike IT Expert mailto:mikeitexp...@gmail.com>> wrote: Please let me know where I can find a good/simple example of mapreduce Python code running on Hadoop. Like tutorial or sth. Thank you
RE: hdfs fsck -locations
I'm not seeing locations flag yet. Rod Nascimento Systems Engineer @ Brazil People don't buy WHAT you do. They buy WHY you do it. From: Mark Kerzner [mailto:mark.kerz...@shmsoft.com] Sent: Friday, January 24, 2014 3:16 PM To: Hadoop User Subject: Re: hdfs fsck -locations Sorry, did not copy the full command hdfs fsck /user/mark/data/word_count.csv -locations Connecting to namenode via http://mark-7:50070 FSCK started by mark (auth:SIMPLE) from /192.168.1.232http://192.168.1.232 for path /user/mark/data/word_count.csv at Fri Jan 24 11:15:17 CST 2014 .Status: HEALTHY Total size: 7217 B Total dirs: 0 Total files: 1 Total blocks (validated):1 (avg. block size 7217 B) Minimally replicated blocks: 1 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks:0 (0.0 %) Mis-replicated blocks:0 (0.0 %) Default replication factor: 1 Average block replication: 1.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 1 Number of racks: 1 FSCK ended at Fri Jan 24 11:15:17 CST 2014 in 1 milliseconds The filesystem under path '/user/mark/data/word_count.csv' is HEALTHY On Fri, Jan 24, 2014 at 11:08 AM, Harsh J ha...@cloudera.commailto:ha...@cloudera.com wrote: Sorry, but what was the question? I also do not see a locations option flag. On Jan 24, 2014 7:17 PM, Mark Kerzner mark.kerz...@shmsoft.commailto:mark.kerz...@shmsoft.com wrote: Here is an example hdfs fsck /user/mark/data/word_count.csv Connecting to namenode via http://mark-7:50070 FSCK started by mark (auth:SIMPLE) from /192.168.1.232http://192.168.1.232 for path /user/mark/data/word_count.csv at Fri Jan 24 07:45:24 CST 2014 .Status: HEALTHY Total size: 7217 B Total dirs: 0 Total files: 1 Total blocks (validated): 1 (avg. block size 7217 B) Minimally replicated blocks: 1 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 1 Average block replication: 1.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 1 Number of racks: 1 FSCK ended at Fri Jan 24 07:45:24 CST 2014 in 0 milliseconds On Fri, Jan 24, 2014 at 4:34 AM, Harsh J ha...@cloudera.commailto:ha...@cloudera.com wrote: Hi Mark, Yes, the locations are shown as IP. On Fri, Jan 24, 2014 at 12:09 AM, Mark Kerzner mark.kerz...@shmsoft.commailto:mark.kerz...@shmsoft.com wrote: Hi, hdfs fsck -locations is supposed to show every block with its location? Is location the ip of the datanode? Thank you, Mark -- Harsh J
RE: hdfs fsck -locations
Hi Mark, It is a sample from my sandbox. Your question is about the part that is in RED at the output below, right? [root@sandbox ~]# hdfs fsck /user/ambari-qa/passwd -locations Connecting to namenode via http://sandbox.hortonworks.com:50070 FSCK started by root (auth:SIMPLE) from /172.16.13.30 for path /user/ambari-qa/passwd at Fri Jan 24 09:53:43 PST 2014 . /user/ambari-qa/passwd: Under replicated BP-1578958328-10.0.2.15-1382306880516:blk_1073742464_1640. Target Replicas is 3 but found 1 replica(s). Status: HEALTHY Total size:1708 B Total dirs:0 Total files:1 Total symlinks:0 Total blocks (validated):1 (avg. block size 1708 B) Minimally replicated blocks:1 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks:1 (100.0 %) Mis-replicated blocks:0 (0.0 %) Default replication factor:3 Average block replication:1.0 Corrupt blocks:0 Missing replicas:2 (66.64 %) Number of data-nodes:1 Number of racks:1 FSCK ended at Fri Jan 24 09:53:43 PST 2014 in 1 milliseconds The filesystem under path '/user/ambari-qa/passwd' is HEALTHY [root@sandbox ~]# Rod Nascimento From: Nascimento, Rodrigo [rodrigo.nascime...@netapp.com] Sent: Friday, January 24, 2014 3:34 PM To: user@hadoop.apache.org Subject: RE: hdfs fsck -locations I’m not seeing locations flag yet. Rod Nascimento Systems Engineer @ Brazil People don’t buy WHAT you do. They buy WHY you do it. From: Mark Kerzner [mailto:mark.kerz...@shmsoft.com] Sent: Friday, January 24, 2014 3:16 PM To: Hadoop User Subject: Re: hdfs fsck -locations Sorry, did not copy the full command hdfs fsck /user/mark/data/word_count.csv -locations Connecting to namenode via http://mark-7:50070 FSCK started by mark (auth:SIMPLE) from /192.168.1.232http://192.168.1.232 for path /user/mark/data/word_count.csv at Fri Jan 24 11:15:17 CST 2014 .Status: HEALTHY Total size: 7217 B Total dirs: 0 Total files: 1 Total blocks (validated):1 (avg. block size 7217 B) Minimally replicated blocks: 1 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks:0 (0.0 %) Mis-replicated blocks:0 (0.0 %) Default replication factor: 1 Average block replication: 1.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 1 Number of racks: 1 FSCK ended at Fri Jan 24 11:15:17 CST 2014 in 1 milliseconds The filesystem under path '/user/mark/data/word_count.csv' is HEALTHY On Fri, Jan 24, 2014 at 11:08 AM, Harsh J ha...@cloudera.commailto:ha...@cloudera.com wrote: Sorry, but what was the question? I also do not see a locations option flag. On Jan 24, 2014 7:17 PM, Mark Kerzner mark.kerz...@shmsoft.commailto:mark.kerz...@shmsoft.com wrote: Here is an example hdfs fsck /user/mark/data/word_count.csv Connecting to namenode via http://mark-7:50070 FSCK started by mark (auth:SIMPLE) from /192.168.1.232http://192.168.1.232 for path /user/mark/data/word_count.csv at Fri Jan 24 07:45:24 CST 2014 .Status: HEALTHY Total size: 7217 B Total dirs: 0 Total files: 1 Total blocks (validated): 1 (avg. block size 7217 B) Minimally replicated blocks: 1 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 1 Average block replication: 1.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 1 Number of racks: 1 FSCK ended at Fri Jan 24 07:45:24 CST 2014 in 0 milliseconds On Fri, Jan 24, 2014 at 4:34 AM, Harsh J ha...@cloudera.commailto:ha...@cloudera.com wrote: Hi Mark, Yes, the locations are shown as IP. On Fri, Jan 24, 2014 at 12:09 AM, Mark Kerzner mark.kerz...@shmsoft.commailto:mark.kerz...@shmsoft.com wrote: Hi, hdfs fsck -locations is supposed to show every block with its location? Is location the ip of the datanode? Thank you, Mark -- Harsh J