Re: Python Hadoop Example

2019-06-17 Thread Nascimento, Rodrigo
Wei-Chiu,

I see people using python with Spark (pySpark).

{
  "Name"  : "Rodrigo Nascimento",
  "Title" : "Solutions Architect – Open Ecosystems"
}

From: Wei-Chiu Chuang 
Date: Sunday, June 16, 2019 at 2:01 PM
To: Artem Ervits 
Cc: Mike IT Expert , user 
Subject: Re: Python Hadoop Example

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.


Thanks Artem,
Looks interesting. I honestly didn't know what Hadoop Streaming API is used for.
Here are more references: 
https://hadoop.apache.org/docs/r3.2.0/hadoop-streaming/HadoopStreaming.html

I think it brings to another question: how do we treat Python as a first class 
citizen. Especially for data science use cases, Python is *the* language.
For example, we have Java and C and (in Hadoop 3.2) C++ client for HDFS. But 
Hadoop does not ship a Python client.
I see a number of Python libraries that support webhdfs. It's not clear to me 
how well they perform, and if they support more advanced features like 
encryption/Kerberos.

NFS gateway is a possibility. Fuse-dfs is another option. But we know they 
don't work at scale, and the community seems to lost the steam to improve 
NFS/fuse-dfs.

Thoughts?

On Sun, Jun 16, 2019 at 6:52 AM Artem Ervits 
mailto:artemerv...@gmail.com>> wrote:
https://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
On Sun, Jun 16, 2019, 9:18 AM Mike IT Expert 
mailto:mikeitexp...@gmail.com>> wrote:
Please let me know where I can find a good/simple example of mapreduce Python 
code running on Hadoop. Like tutorial or sth.

Thank you




RE: hdfs fsck -locations

2014-01-24 Thread Nascimento, Rodrigo
I'm not seeing locations flag yet.

Rod Nascimento
Systems Engineer @ Brazil

People don't buy WHAT you do. They buy WHY you do it.

From: Mark Kerzner [mailto:mark.kerz...@shmsoft.com]
Sent: Friday, January 24, 2014 3:16 PM
To: Hadoop User
Subject: Re: hdfs fsck -locations

Sorry, did not copy the full command

hdfs fsck /user/mark/data/word_count.csv -locations
Connecting to namenode via http://mark-7:50070
FSCK started by mark (auth:SIMPLE) from /192.168.1.232http://192.168.1.232 
for path /user/mark/data/word_count.csv at Fri Jan 24 11:15:17 CST 2014
.Status: HEALTHY
 Total size:   7217 B
 Total dirs:   0
 Total files:  1
 Total blocks (validated):1 (avg. block size 7217 B)
 Minimally replicated blocks:  1 (100.0 %)
 Over-replicated blocks:  0 (0.0 %)
 Under-replicated blocks:0 (0.0 %)
 Mis-replicated blocks:0 (0.0 %)
 Default replication factor:  1
 Average block replication: 1.0
 Corrupt blocks:  0
 Missing replicas: 0 (0.0 %)
 Number of data-nodes:  1
 Number of racks:   1
FSCK ended at Fri Jan 24 11:15:17 CST 2014 in 1 milliseconds


The filesystem under path '/user/mark/data/word_count.csv' is HEALTHY


On Fri, Jan 24, 2014 at 11:08 AM, Harsh J 
ha...@cloudera.commailto:ha...@cloudera.com wrote:

Sorry, but what was the question? I also do not see a locations option flag.
On Jan 24, 2014 7:17 PM, Mark Kerzner 
mark.kerz...@shmsoft.commailto:mark.kerz...@shmsoft.com wrote:
Here is an example

 hdfs fsck /user/mark/data/word_count.csv
Connecting to namenode via http://mark-7:50070
FSCK started by mark (auth:SIMPLE) from /192.168.1.232http://192.168.1.232 
for path /user/mark/data/word_count.csv at Fri Jan 24 07:45:24 CST 2014
.Status: HEALTHY
 Total size: 7217 B
 Total dirs: 0
 Total files: 1
 Total blocks (validated): 1 (avg. block size 7217 B)
 Minimally replicated blocks: 1 (100.0 %)
 Over-replicated blocks: 0 (0.0 %)
 Under-replicated blocks: 0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor: 1
 Average block replication: 1.0
 Corrupt blocks: 0
 Missing replicas: 0 (0.0 %)
 Number of data-nodes: 1
 Number of racks: 1
FSCK ended at Fri Jan 24 07:45:24 CST 2014 in 0 milliseconds


On Fri, Jan 24, 2014 at 4:34 AM, Harsh J 
ha...@cloudera.commailto:ha...@cloudera.com wrote:
Hi Mark,

Yes, the locations are shown as IP.

On Fri, Jan 24, 2014 at 12:09 AM, Mark Kerzner 
mark.kerz...@shmsoft.commailto:mark.kerz...@shmsoft.com wrote:
 Hi,

 hdfs fsck -locations

 is supposed to show every block with its location? Is location the ip of the
 datanode?

 Thank you,
 Mark


--
Harsh J




RE: hdfs fsck -locations

2014-01-24 Thread Nascimento, Rodrigo
Hi Mark,

It is a sample from my sandbox. Your question is about the part that is in RED 
at the output below, right?

[root@sandbox ~]# hdfs fsck /user/ambari-qa/passwd  -locations
Connecting to namenode via http://sandbox.hortonworks.com:50070
FSCK started by root (auth:SIMPLE) from /172.16.13.30 for path 
/user/ambari-qa/passwd at Fri Jan 24 09:53:43 PST 2014
.
/user/ambari-qa/passwd:  Under replicated 
BP-1578958328-10.0.2.15-1382306880516:blk_1073742464_1640. Target Replicas is 3 
but found 1 replica(s).
Status: HEALTHY
 Total size:1708 B
 Total dirs:0
 Total files:1
 Total symlinks:0
 Total blocks (validated):1 (avg. block size 1708 B)
 Minimally replicated blocks:1 (100.0 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:1 (100.0 %)
 Mis-replicated blocks:0 (0.0 %)
 Default replication factor:3
 Average block replication:1.0
 Corrupt blocks:0
 Missing replicas:2 (66.64 %)
 Number of data-nodes:1
 Number of racks:1
FSCK ended at Fri Jan 24 09:53:43 PST 2014 in 1 milliseconds


The filesystem under path '/user/ambari-qa/passwd' is HEALTHY
[root@sandbox ~]#

Rod Nascimento


From: Nascimento, Rodrigo [rodrigo.nascime...@netapp.com]
Sent: Friday, January 24, 2014 3:34 PM
To: user@hadoop.apache.org
Subject: RE: hdfs fsck -locations

I’m not seeing locations flag yet.

Rod Nascimento
Systems Engineer @ Brazil

People don’t buy WHAT you do. They buy WHY you do it.

From: Mark Kerzner [mailto:mark.kerz...@shmsoft.com]
Sent: Friday, January 24, 2014 3:16 PM
To: Hadoop User
Subject: Re: hdfs fsck -locations

Sorry, did not copy the full command

hdfs fsck /user/mark/data/word_count.csv -locations
Connecting to namenode via http://mark-7:50070
FSCK started by mark (auth:SIMPLE) from /192.168.1.232http://192.168.1.232 
for path /user/mark/data/word_count.csv at Fri Jan 24 11:15:17 CST 2014
.Status: HEALTHY
 Total size:   7217 B
 Total dirs:   0
 Total files:  1
 Total blocks (validated):1 (avg. block size 7217 B)
 Minimally replicated blocks:  1 (100.0 %)
 Over-replicated blocks:  0 (0.0 %)
 Under-replicated blocks:0 (0.0 %)
 Mis-replicated blocks:0 (0.0 %)
 Default replication factor:  1
 Average block replication: 1.0
 Corrupt blocks:  0
 Missing replicas: 0 (0.0 %)
 Number of data-nodes:  1
 Number of racks:   1
FSCK ended at Fri Jan 24 11:15:17 CST 2014 in 1 milliseconds


The filesystem under path '/user/mark/data/word_count.csv' is HEALTHY


On Fri, Jan 24, 2014 at 11:08 AM, Harsh J 
ha...@cloudera.commailto:ha...@cloudera.com wrote:

Sorry, but what was the question? I also do not see a locations option flag.
On Jan 24, 2014 7:17 PM, Mark Kerzner 
mark.kerz...@shmsoft.commailto:mark.kerz...@shmsoft.com wrote:
Here is an example

 hdfs fsck /user/mark/data/word_count.csv
Connecting to namenode via http://mark-7:50070
FSCK started by mark (auth:SIMPLE) from /192.168.1.232http://192.168.1.232 
for path /user/mark/data/word_count.csv at Fri Jan 24 07:45:24 CST 2014
.Status: HEALTHY
 Total size: 7217 B
 Total dirs: 0
 Total files: 1
 Total blocks (validated): 1 (avg. block size 7217 B)
 Minimally replicated blocks: 1 (100.0 %)
 Over-replicated blocks: 0 (0.0 %)
 Under-replicated blocks: 0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor: 1
 Average block replication: 1.0
 Corrupt blocks: 0
 Missing replicas: 0 (0.0 %)
 Number of data-nodes: 1
 Number of racks: 1
FSCK ended at Fri Jan 24 07:45:24 CST 2014 in 0 milliseconds


On Fri, Jan 24, 2014 at 4:34 AM, Harsh J 
ha...@cloudera.commailto:ha...@cloudera.com wrote:
Hi Mark,

Yes, the locations are shown as IP.

On Fri, Jan 24, 2014 at 12:09 AM, Mark Kerzner 
mark.kerz...@shmsoft.commailto:mark.kerz...@shmsoft.com wrote:
 Hi,

 hdfs fsck -locations

 is supposed to show every block with its location? Is location the ip of the
 datanode?

 Thank you,
 Mark


--
Harsh J