Re: Mechanism of hadoop -jar

2012-08-11 Thread in.abdul
Hi jay vas,
   If you are trying for accessing bungle of text file . Instead of going
for class.getResource() why dont you go for distributed.cache in hadoop .
As i had previously noticed about that in the mailing list

I hope it will help you better . If there is any problem let us know
Regards
Syed abdul kather
On Aug 12, 2012 4:17 AM, Bertrand Dechoux [via Lucene] 
ml-node+s472066n400062...@n3.nabble.com wrote:

 1) Source code for sure. I don't know if you could find any other
 technical
 document about it.

 2) Where is your file? If it is inside your jar, hadoop should not infer
 with the 'normal way'. It is a 'classical jvm'. If you want to distribute
 your files (across your nodes), you should look at DistributedCache.

 http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/filecache/DistributedCache.html

 Regards

 Bertrand

 On Sun, Aug 12, 2012 at 12:09 AM, Jay Vyas [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4000625i=0
 wrote:

  Hi guys:  I'm trying to find documentation on how hadoop jar actually
  works i.e. how it copies/runs the jar file across the cluster, in order
 to
  debug a jar issue.
 
  1) Where can I get a good explanation of how the hadoop commands (i.e.
  -jar) are implemented ?
 
  2) Specifically, Im trying to access a bundled text file from a jar :
 
  class.getResource(myfile.txt)
 
  from inside a mapreduce job Is it okay to do this ?  Or does a
 classes
  ability to aquire local resources change  in the mapper/reducer JVMs ?
 
 
 
  --
  Jay Vyas
  MMSB/UCHC
 



 --
 Bertrand Dechoux


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Mechanism-of-hadoop-jar-tp4000622p4000625.html
  To unsubscribe from Lucene, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





-
THANKS AND REGARDS,
SYED ABDUL KATHER
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mechanism-of-hadoop-jar-tp4000622p4000628.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: is HDFS RAID data locality efficient?

2012-08-08 Thread in.abdul
Nice explanation guys .. thanks

Syed Abdul kather
send from Samsung S3
On Aug 9, 2012 12:02 AM, Ajit Ratnaparkhi [via Lucene] 
ml-node+s472066n32...@n3.nabble.com wrote:

 Agreed with Steve.
 That is most important use of HDFS RAID, where you consume less disk space
 with same reliability and availability guarantee at cost of processing
 performance. Most of data in hdfs is cold data, without HDFS RAID you end
 up maintaining 3 replicas of data which is hardly going to be processed
 again, but you cant remove/move this data to separate archive because if
  required processing should be as soon as possible.

 -Ajit

 On Wed, Aug 8, 2012 at 11:01 PM, Steve Loughran [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=322i=0
  wrote:



 On 8 August 2012 09:46, Sourygna Luangsay [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=322i=1
  wrote:

  Hi folks!

 One of the scenario I can think in order to take advantage of HDFS RAID
 without suffering this penalty is:**

 **-  **Using normal HDFS with default replication=3 for my
 “fresh data”

 **-  **Using HDFS RAID for my historical data (that is barely
 used by M/R)

 ** **



 exactly: less space use on cold data, with the penalty that access
 performance can be worse. As the majority of data on a hadoop cluster is
 usually cold, it's a space and power efficient story for the archive data

 --
 Steve Loughran
 Hortonworks Inc




 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/is-HDFS-RAID-data-locality-efficient-tp3999891p322.html
  To unsubscribe from Lucene, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





-
THANKS AND REGARDS,
SYED ABDUL KATHER
--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-HDFS-RAID-data-locality-efficient-tp3999891p324.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: creating cluster to analyze unsubscribe emails

2012-08-08 Thread in.abdul
:-)

Syed Abdul kather
send from Samsung S3
On Aug 9, 2012 4:42 AM, Hennig, Ryan [via Lucene] 
ml-node+s472066n370...@n3.nabble.com wrote:

  Hello,

 ** **

 I’m thinking about building a hadoop cluster to analyze all the
 unsubscribe mails that people mistakenly send to this address.  How many PB
 of storage will I need?

 ** **

 - Ryan


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/creating-cluster-to-analyze-unsubscribe-emails-tp370.html
  To unsubscribe from Lucene, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





-
THANKS AND REGARDS,
SYED ABDUL KATHER
--
View this message in context: 
http://lucene.472066.n3.nabble.com/creating-cluster-to-analyze-unsubscribe-emails-tp370p392.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: Hadoop list?

2012-08-08 Thread in.abdul
Yes Ryan .. welcome to Hadoop community ..

Syed Abdul kather
send from Samsung S3
On Aug 9, 2012 10:34 AM, Ryan Rosario [via Lucene] 
ml-node+s472066n429...@n3.nabble.com wrote:

 Is this the correct list for Hadoop help?


 --
  If you reply to this email, your message will be added to the discussion
 below:
 http://lucene.472066.n3.nabble.com/Hadoop-list-tp429.html
  To unsubscribe from Lucene, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





-
THANKS AND REGARDS,
SYED ABDUL KATHER
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Hadoop-list-tp429p433.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: Trigger job from Java application causes ClassNotFound

2012-07-27 Thread in.abdul
Hi Steve

But if I try and run it from my dev pc in Eclipse (where all the
same dependencies are still in the classpath), and add the 3 hadoop
xml files to the classpath, it triggers hadoop jobs, but they fail
with error

There is problem in eclipse build path . I had faced same problem when i am
trying to do clustering . Have look on build path.


In mvn case it will download all the dependency jar from repo . if you want
to excete that in eclipse then you have to configure build path

I can suggest you to look at
http://shuyo.wordpress.com/2011/02/01/mahout-development-environment-with-maven-and-eclipse-1/

which can help you ..







Thanks and Regards,
S SYED ABDUL KATHER



On Fri, Jul 27, 2012 at 6:51 AM, Steve Armstrong [via Lucene] 
ml-node+s472066n3997615...@n3.nabble.com wrote:

 Hi Syed,

 Do you mean I need to deploy the mahout jars to the lib directory of
 the master node? Or all the data nodes? Or is there a way to simply
 tell the hadoop job launcher to upload the jars itself?

 Steve

 On Thu, Jul 26, 2012 at 6:10 PM, syed kather [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=3997615i=0
 wrote:

  Hi Steve ,
  I hope you had missed that Sep ific jar to copy into your Hadoop lib
  directories.  Have a look on ur lib .
  On Jul 27, 2012 4:49 AM, Steve Armstrong [hidden 
  email]http://user/SendEmail.jtp?type=nodenode=3997615i=1
 wrote:
 
  Hello,
 
  I'm trying to trigger a Mahout job from inside my Java application
  (running in Eclipse), and get it running on my cluster. I have a main
  class that simply contains:
 
  String[] args = new String[] { --input, /input/triples.csv,
  --output, /output/vectors.txt, --similarityClassname,
  VectorSimilarityMeasures.SIMILARITY_COOCCURRENCE.toString(),
  --numRecommendations, 1, --tempDir, temp/ +
  System.currentTimeMillis() };
  Configuration conf = new Configuration();
  ToolRunner.run(conf, new RecommenderJob(), args);
 
  If I package the whole project up in a single jar (using Maven), copy
  it to the namenode, and run it with hadoop jar project.jar it works
  fine. But if I try and run it from my dev pc in Eclipse (where all the
  same dependencies are still in the classpath), and add the 3 hadoop
  xml files to the classpath, it triggers hadoop jobs, but they fail
  with errors like:
 
  12/07/26 14:42:09 INFO mapred.JobClient: Task Id :
  attempt_201206261211_0173_m_01_0, Status : FAILED
  Error: java.lang.ClassNotFoundException:
 com.google.common.primitives.Longs
  at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
  ...
 
  What I'm trying to create is a self-contained JAR that can be run from
  the command-line and launch the mahout job on the cluster. I've got
  this all working with embedded pig scripts, but I can't get it working
  here.
 
  Any help is appreciated, or advice on better ways to trigger the jobs
 from
  code.
 
  Thanks
 


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Trigger-job-from-Java-application-causes-ClassNotFound-tp3997583p3997615.html
  To unsubscribe from Lucene, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





-
THANKS AND REGARDS,
SYED ABDUL KATHER
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Trigger-job-from-Java-application-causes-ClassNotFound-tp3997583p3997686.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: Hadoop 1.0.3 start-daemon.sh doesn't start all the expected daemons

2012-07-27 Thread in.abdul
Hi Dinesh Joshi,
  Can you please paste your xml(core site,hdfs-site,mapred-site) and did
you found any Error in your log dir ..

Thanks and Regards,
S SYED ABDUL KATHER



On Fri, Jul 27, 2012 at 2:54 PM, Dinesh Joshi [via Lucene] 
ml-node+s472066n3997685...@n3.nabble.com wrote:

 Hi all,

 I installed Hadoop 1.0.3 and am running it as a single node cluster. I
 noticed that start-daemon.sh only starts Namenode, Secondary Namenode
 and the JobTracker daemon.

 Datanode and Tasktracker daemons are not started. However, when I
 start them individually they start up without any issues.

 I'm running this on Ubuntu 11.04 and followed the instructions here [1][2]

 Can someone point me to some debugging steps?

 Thanks.

 Dinesh

 [1]
 http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
 [2] http://hadoop.apache.org/common/docs/stable/mapred_tutorial.html


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Hadoop-1-0-3-start-daemon-sh-doesn-t-start-all-the-expected-daemons-tp3997685.html
  To unsubscribe from Lucene, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





-
THANKS AND REGARDS,
SYED ABDUL KATHER
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Hadoop-1-0-3-start-daemon-sh-doesn-t-start-all-the-expected-daemons-tp3997685p3997688.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: Can I change hadoo.tmp.dir for each jon run without formatting

2012-07-27 Thread in.abdul
Hi abhay ,
As alok mentioned that's a perfect choice to override on runtime . Make
sure that properties should not be set as final in configuration file .

Regards
Syed
On Jul 26, 2012 12:16 AM, Alok Kumar [via Lucene] 
ml-node+s472066n3997300...@n3.nabble.com wrote:

 Hi Abhay,

 On Wed, Jul 25, 2012 at 10:44 PM, Abhay Ratnaparkhi
 [hidden email] http://user/SendEmail.jtp?type=nodenode=3997300i=0
 wrote:
  hadoop.tmp.dir points to the directory on local disk to store
  intermediate task related data.
 
  It's currently mounted to /tmp/hadoop for me. Some of my jobs are
 running
  and Filesystem on which '/tmp' is mounted is getting full.
  Is it possible to change hadoop.tmp.dir parameter before submitting a
 new
  job?

 You can override hadoop.tmp.dir everytime before submitting your Job.
 I tried like this :

 Configuration configuration = new Configuration();
 config.set(hadoop.tmp.dir, /home/user/some-other-path);
 Job job = new Job(config, Job1);

 It produced same result (I didn't format anything)

 Thanks
 --
 Alok


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Can-I-change-hadoo-tmp-dir-for-each-jon-run-without-formatting-tp3997287p3997300.html
  To unsubscribe from Lucene, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





-
THANKS AND REGARDS,
SYED ABDUL KATHER
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-I-change-hadoo-tmp-dir-for-each-jon-run-without-formatting-tp3997287p3997762.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: incremental loads into hadoop

2011-10-03 Thread in.abdul
There is two method is there for processing OLTP

   1.  Hstremming or scibe  these are only methodes
   2. if not use chukuwa for storing the data so that when i you got a
   tesent volume then you can move to HDFS

Thanks and Regards,
S SYED ABDUL KATHER
9731841519


On Sat, Oct 1, 2011 at 4:32 AM, Sam Seigal [via Lucene] 
ml-node+s472066n3383949...@n3.nabble.com wrote:

 Hi,

 I am relatively new to Hadoop and was wondering how to do incremental
 loads into HDFS.

 I have a continuous stream of data flowing into a service which is
 writing to an OLTP store. Due to the high volume of data, we cannot do
 aggregations on the OLTP store, since this starts affecting the write
 performance.

 We would like to offload this processing into a Hadoop cluster, mainly
 for doing aggregations/analytics.

 The question is how can this continuous stream of data be
 incrementally loaded and processed into Hadoop ?

 Thank you,

 Sam


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/incremental-loads-into-hadoop-tp3383949p3383949.html
  To unsubscribe from Lucene, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw.




-
THANKS AND REGARDS,
SYED ABDUL KATHER
--
View this message in context: 
http://lucene.472066.n3.nabble.com/incremental-loads-into-hadoop-tp3383949p3385689.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.