Hadoop and security.

2008-10-04 Thread Dmitry Pushkarev
Dear hadoop users, 

 

I'm lucky to work in academic environment where information security is not
the question. However, I'm sure that most of the hadoop users aren't. 

 

Here is the question: how secure hadoop is?  (or let's say foolproof)

 

Here is the answer: http://www.google.com/search?client=opera

&rls=en&q=Hadoop+Map/Reduce+Administration&sourceid=opera&ie=utf-8&oe=utf-8
not quite.

 

What we're seeing here is open hadoop cluster, where anyone who capable of
installing hadoop and changing his username to webcrawl can use their
cluster and read their data, even though firewall is perfectly installed and
ports like ssh are filtered to outsiders. After you've played enough with
data, you can observe that you can submit jobs as well, and these jobs can
execute shell commands. Which is very, very sad.

 

In my view, this significantly limits distributed hadoop applications, where
part of your cluster may reside on EC2 or other distant datacenter, since
you always need to have certain ports open to an array of ip addresses (if
your instances are dynamic) which isn't acceptable if anyone from that ip
range can connect to your cluster.

 

Can we propose to developers to introduce some basic user-management and
access controls to help hadoop make one step further towards
production-quality system?

 

And, by the way add robots.txt to default distribution.  (but I doubt it
will help, as it takes less than a week to scan all internet for given port
on home DSL..)

 

---

Dmitry

 



Re: "Could not get block locations. Aborting..." exception

2008-10-04 Thread Raghu Angadi


https://issues.apache.org/jira/browse/HADOOP-4346 might explain this.

Raghu.

Bryan Duxbury wrote:
Ok, so, what might I do next to try and diagnose this? Does it sound 
like it might be an HDFS/mapreduce bug, or should I pore over my own 
code first?


Also, did any of the other exceptions look interesting?

-Bryan

On Sep 29, 2008, at 10:40 AM, Raghu Angadi wrote:


Raghu Angadi wrote:

Doug Cutting wrote:

Raghu Angadi wrote:
For the current implementation, you need around 3x fds. 1024 is too 
low for Hadoop. The Hadoop requirement will come down, but 1024 
would be too low anyway.


1024 is the default on many systems.  Shouldn't we try to make the 
default configuration work well there?

How can 1024 work well for different kinds of loads?


oops! 1024 should work for anyone "working with just one file" for any 
load. I didn't notice that. My comment can be ignored.


Raghu.






UnknownHost Exception

2008-10-04 Thread Ashok Varma
I have configured my machine as  Pseudo-Distributed :

I'm getting this error : *STARTUP_MSG:   host =
java.net.UnknownHostException: hadoop: hadoop

but i have given the host name as *"Hadoop-host" *in /etc/hosts *
Here is the log file output
===
2008-10-04 18:02:46,717 INFO org.apache.hadoop.dfs.NameNode.Secondary:
STARTUP_MSG:
/
STARTUP_MSG: Starting SecondaryNameNode
STARTUP_MSG:   host = java.net.UnknownHostException: hadoop: hadoop
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.18.0
STARTUP_MSG:   build =
http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r 686010;
compiled by 'hadoopqa' on Thu Aug 14 19:48:33 UTC 2008
/
2008-10-04 18:02:46,815 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=SecondaryNameNode, sessionId=null
2008-10-04 18:02:46,829 INFO org.apache.hadoop.util.MetricsUtil: Unable to
obtain hostName
java.net.UnknownHostException: hadoop: hadoop
at java.net.InetAddress.getLocalHost(InetAddress.java:1353)
at
org.apache.hadoop.metrics.MetricsUtil.getHostName(MetricsUtil.java:86)
at
org.apache.hadoop.metrics.MetricsUtil.createRecord(MetricsUtil.java:75)
at org.apache.hadoop.metrics.jvm.JvmMetrics.(JvmMetrics.java:77)
at org.apache.hadoop.metrics.jvm.JvmMetrics.init(JvmMetrics.java:69)
at
org.apache.hadoop.dfs.SecondaryNameNode.initialize(SecondaryNameNode.java:120)
at
org.apache.hadoop.dfs.SecondaryNameNode.(SecondaryNameNode.java:108)
at
org.apache.hadoop.dfs.SecondaryNameNode.main(SecondaryNameNode.java:460)
2008-10-04 18:02:47,005 WARN org.apache.hadoop.dfs.Storage: Checkpoint
directory /tmp/hadoop-hadoop/dfs/namesecondary is added.
2008-10-04 18:02:47,092 INFO org.mortbay.util.Credential: Checking Resource
aliases
2008-10-04 18:02:47,146 INFO org.mortbay.http.HttpServer: Version
Jetty/5.1.4
2008-10-04 18:02:47,148 INFO org.mortbay.util.Container: Started
HttpContext[/static,/static]
2008-10-04 18:02:47,148 INFO org.mortbay.util.Container: Started
HttpContext[/logs,/logs]
2008-10-04 18:02:47,616 INFO org.mortbay.jetty.servlet.XMLConfiguration: No
WEB-INF/web.xml in file:/home/hadoop/Hadoop/hadoop-0.18.0/webapps/secondary.
Serving files and default/dynamic servlets only
2008-10-04 18:02:47,618 INFO org.mortbay.util.Container: Started
[EMAIL PROTECTED]
2008-10-04 18:02:47,669 INFO org.mortbay.util.Container: Started
WebApplicationContext[/,/]
2008-10-04 18:02:47,672 INFO org.mortbay.http.SocketListener: Started
SocketListener on 0.0.0.0:50090
2008-10-04 18:02:47,672 INFO org.mortbay.util.Container: Started
[EMAIL PROTECTED]
2008-10-04 18:02:47,672 INFO org.apache.hadoop.dfs.NameNode.Secondary:
Secondary Web-server up at: 0.0.0.0:50090
2008-10-04 18:02:47,672 WARN org.apache.hadoop.dfs.NameNode.Secondary:
Checkpoint Period   :3600 secs (60 min)
2008-10-04 18:02:47,672 WARN org.apache.hadoop.dfs.NameNode.Secondary: Log
Size Trigger:67108864 bytes (65536 KB)
2008-10-04 18:05:33,896 INFO org.apache.hadoop.dfs.NameNode.Secondary:
SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down SecondaryNameNode at
java.net.UnknownHostException: hadoop: hadoop
/


Re: A question about Mapper

2008-10-04 Thread Zhou, Yunqing
Thanks a lot for such a detailed explanation.but I think the reducer here is
unnecessary, so I set the reducer number to 0.
then, I'd like to solve them all in mappers.
so I met with the problem.
Thanks anyway.

On Sat, Oct 4, 2008 at 3:33 PM, Joman Chu <[EMAIL PROTECTED]> wrote:

> Hello,
>
> I assume you want to associate {a,b}, {c,d,e}, and {f} into sets.
>
> One way to do this is by associating some value with each flag and then
> emitting the data associated with that value. For example,
>
> flag
> a
> b
> flag
> c
> d
> e
> flag
> f
>
> I define flag,a,b,c,d,e,f to be the key while in the Mapper context.
>
> Whenever the mapper sees a key, it will emit . UID is some unique
> identifier associated with a certain set, and Key is the key that was passed
> into the mapper. We are essentially inverting the association here.
>
> Let's step through this testcase.
>  1. Choose UID = mapper1flag1.
>  2.  -> Mapper -> 
>  3. We have reached a flag, so we change the UID = mapper1flag2.
>  4.  -> Mapper -> 
>  5.  -> Mapper -> 
>  6.  -> Mapper -> 
>  7. We have reached a flag, so we change the UID = mapper1flag3.
>  8.  -> Mapper -> 
>  9.  -> Mapper -> 
> 10.  -> Mapper -> 
> 11.  -> Mapper -> 
> 12. We have reached a flag, so we change the UID = mapper1flag4.
> 13.  -> Mapper -> 
> 14. EOF
>
> Then the reducers will collect all values with the same UID, so here is
> what we get:
>
> 1.  -> Reducer -> <{}, null>
> 2.  -> Reducer -> <{a,b}, null>
> 3.  -> Reducer -> <{c,d,e}, null>
> 4.  -> Reducer -> <{f}, null>
>
> Hopefully this solves your problem.
>
> On Sat, October 4, 2008 2:48 am, Zhou, Yunqing said:
> > but the close() function doesn't supply me a Collector to put pairs in.
> >
> > Is it reasonable for me to store a reference of the collector in advance?
> >
> >
> > I'm not sure if the collector is still available then.
> >
> >
> >
> >
> > On Sat, Oct 4, 2008 at 12:17 PM, Joman Chu <[EMAIL PROTECTED]>
> wrote:
> >
> >
> >> Hello,
> >>
> >> Does MapReduceBase.close() fit your needs? Take a look at
> >> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred
> >> /MapReduceBase.html#close()
> >>
> >> On Fri, October 3, 2008 11:36 pm, Zhou, Yunqing said:
> >>> the input is as follows. flag a b flag c d e flag f
> >>>
> >>> then I used a mapper to first store values and then emit them all
> >>> when met with a line contains "flag" but when the file reached its
> >>> end, I have no chance to emit the last record.(in this case ,f) so how
> >>> can I detect
> >> the
> >>> mapper's end of its life , or how can I emit a last record before a
> >> mapper
> >>> exits.
> >>>
> >>> Thanks
> >>>
> >>
> >> Have a good one, -- Joman Chu Carnegie Mellon University School of
> Computer
> >> Science 2011 AIM: ARcanUSNUMquam
> >>
> >>
> >
>
>
> --
> Joman Chu
> Carnegie Mellon University
> School of Computer Science 2011
> AIM: ARcanUSNUMquam
>
>


Re: A question about Mapper

2008-10-04 Thread Joman Chu
Hello,

I assume you want to associate {a,b}, {c,d,e}, and {f} into sets.

One way to do this is by associating some value with each flag and then 
emitting the data associated with that value. For example,

flag
a
b
flag
c
d
e
flag
f

I define flag,a,b,c,d,e,f to be the key while in the Mapper context.

Whenever the mapper sees a key, it will emit . UID is some unique 
identifier associated with a certain set, and Key is the key that was passed 
into the mapper. We are essentially inverting the association here.

Let's step through this testcase.
 1. Choose UID = mapper1flag1.
 2.  -> Mapper -> 
 3. We have reached a flag, so we change the UID = mapper1flag2.
 4.  -> Mapper -> 
 5.  -> Mapper -> 
 6.  -> Mapper -> 
 7. We have reached a flag, so we change the UID = mapper1flag3.
 8.  -> Mapper -> 
 9.  -> Mapper -> 
10.  -> Mapper -> 
11.  -> Mapper -> 
12. We have reached a flag, so we change the UID = mapper1flag4.
13.  -> Mapper -> 
14. EOF

Then the reducers will collect all values with the same UID, so here is what we 
get:

1.  -> Reducer -> <{}, null>
2.  -> Reducer -> <{a,b}, null>
3.  -> Reducer -> <{c,d,e}, null>
4.  -> Reducer -> <{f}, null>

Hopefully this solves your problem.

On Sat, October 4, 2008 2:48 am, Zhou, Yunqing said:
> but the close() function doesn't supply me a Collector to put pairs in.
> 
> Is it reasonable for me to store a reference of the collector in advance?
> 
> 
> I'm not sure if the collector is still available then.
> 
> 
> 
> 
> On Sat, Oct 4, 2008 at 12:17 PM, Joman Chu <[EMAIL PROTECTED]> wrote:
> 
> 
>> Hello,
>> 
>> Does MapReduceBase.close() fit your needs? Take a look at 
>> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred
>> /MapReduceBase.html#close()
>> 
>> On Fri, October 3, 2008 11:36 pm, Zhou, Yunqing said:
>>> the input is as follows. flag a b flag c d e flag f
>>> 
>>> then I used a mapper to first store values and then emit them all
>>> when met with a line contains "flag" but when the file reached its
>>> end, I have no chance to emit the last record.(in this case ,f) so how
>>> can I detect
>> the
>>> mapper's end of its life , or how can I emit a last record before a
>> mapper
>>> exits.
>>> 
>>> Thanks
>>> 
>> 
>> Have a good one, -- Joman Chu Carnegie Mellon University School of Computer
>> Science 2011 AIM: ARcanUSNUMquam
>> 
>> 
> 


-- 
Joman Chu
Carnegie Mellon University
School of Computer Science 2011
AIM: ARcanUSNUMquam