Re: Mechanism of hadoop -jar
Hi jay vas, If you are trying for accessing bungle of text file . Instead of going for class.getResource() why dont you go for distributed.cache in hadoop . As i had previously noticed about that in the mailing list I hope it will help you better . If there is any problem let us know Regards Syed abdul kather On Aug 12, 2012 4:17 AM, Bertrand Dechoux [via Lucene] ml-node+s472066n400062...@n3.nabble.com wrote: 1) Source code for sure. I don't know if you could find any other technical document about it. 2) Where is your file? If it is inside your jar, hadoop should not infer with the 'normal way'. It is a 'classical jvm'. If you want to distribute your files (across your nodes), you should look at DistributedCache. http://hadoop.apache.org/common/docs/r1.0.0/api/org/apache/hadoop/filecache/DistributedCache.html Regards Bertrand On Sun, Aug 12, 2012 at 12:09 AM, Jay Vyas [hidden email]http://user/SendEmail.jtp?type=nodenode=4000625i=0 wrote: Hi guys: I'm trying to find documentation on how hadoop jar actually works i.e. how it copies/runs the jar file across the cluster, in order to debug a jar issue. 1) Where can I get a good explanation of how the hadoop commands (i.e. -jar) are implemented ? 2) Specifically, Im trying to access a bundled text file from a jar : class.getResource(myfile.txt) from inside a mapreduce job Is it okay to do this ? Or does a classes ability to aquire local resources change in the mapper/reducer JVMs ? -- Jay Vyas MMSB/UCHC -- Bertrand Dechoux -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Mechanism-of-hadoop-jar-tp4000622p4000625.html To unsubscribe from Lucene, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml - THANKS AND REGARDS, SYED ABDUL KATHER -- View this message in context: http://lucene.472066.n3.nabble.com/Mechanism-of-hadoop-jar-tp4000622p4000628.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: is HDFS RAID data locality efficient?
Nice explanation guys .. thanks Syed Abdul kather send from Samsung S3 On Aug 9, 2012 12:02 AM, Ajit Ratnaparkhi [via Lucene] ml-node+s472066n32...@n3.nabble.com wrote: Agreed with Steve. That is most important use of HDFS RAID, where you consume less disk space with same reliability and availability guarantee at cost of processing performance. Most of data in hdfs is cold data, without HDFS RAID you end up maintaining 3 replicas of data which is hardly going to be processed again, but you cant remove/move this data to separate archive because if required processing should be as soon as possible. -Ajit On Wed, Aug 8, 2012 at 11:01 PM, Steve Loughran [hidden email]http://user/SendEmail.jtp?type=nodenode=322i=0 wrote: On 8 August 2012 09:46, Sourygna Luangsay [hidden email]http://user/SendEmail.jtp?type=nodenode=322i=1 wrote: Hi folks! One of the scenario I can think in order to take advantage of HDFS RAID without suffering this penalty is:** **- **Using normal HDFS with default replication=3 for my “fresh data” **- **Using HDFS RAID for my historical data (that is barely used by M/R) ** ** exactly: less space use on cold data, with the penalty that access performance can be worse. As the majority of data on a hadoop cluster is usually cold, it's a space and power efficient story for the archive data -- Steve Loughran Hortonworks Inc -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/is-HDFS-RAID-data-locality-efficient-tp3999891p322.html To unsubscribe from Lucene, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml - THANKS AND REGARDS, SYED ABDUL KATHER -- View this message in context: http://lucene.472066.n3.nabble.com/is-HDFS-RAID-data-locality-efficient-tp3999891p324.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: creating cluster to analyze unsubscribe emails
:-) Syed Abdul kather send from Samsung S3 On Aug 9, 2012 4:42 AM, Hennig, Ryan [via Lucene] ml-node+s472066n370...@n3.nabble.com wrote: Hello, ** ** I’m thinking about building a hadoop cluster to analyze all the unsubscribe mails that people mistakenly send to this address. How many PB of storage will I need? ** ** - Ryan -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/creating-cluster-to-analyze-unsubscribe-emails-tp370.html To unsubscribe from Lucene, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml - THANKS AND REGARDS, SYED ABDUL KATHER -- View this message in context: http://lucene.472066.n3.nabble.com/creating-cluster-to-analyze-unsubscribe-emails-tp370p392.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: Hadoop list?
Yes Ryan .. welcome to Hadoop community .. Syed Abdul kather send from Samsung S3 On Aug 9, 2012 10:34 AM, Ryan Rosario [via Lucene] ml-node+s472066n429...@n3.nabble.com wrote: Is this the correct list for Hadoop help? -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Hadoop-list-tp429.html To unsubscribe from Lucene, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml - THANKS AND REGARDS, SYED ABDUL KATHER -- View this message in context: http://lucene.472066.n3.nabble.com/Hadoop-list-tp429p433.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: Trigger job from Java application causes ClassNotFound
Hi Steve But if I try and run it from my dev pc in Eclipse (where all the same dependencies are still in the classpath), and add the 3 hadoop xml files to the classpath, it triggers hadoop jobs, but they fail with error There is problem in eclipse build path . I had faced same problem when i am trying to do clustering . Have look on build path. In mvn case it will download all the dependency jar from repo . if you want to excete that in eclipse then you have to configure build path I can suggest you to look at http://shuyo.wordpress.com/2011/02/01/mahout-development-environment-with-maven-and-eclipse-1/ which can help you .. Thanks and Regards, S SYED ABDUL KATHER On Fri, Jul 27, 2012 at 6:51 AM, Steve Armstrong [via Lucene] ml-node+s472066n3997615...@n3.nabble.com wrote: Hi Syed, Do you mean I need to deploy the mahout jars to the lib directory of the master node? Or all the data nodes? Or is there a way to simply tell the hadoop job launcher to upload the jars itself? Steve On Thu, Jul 26, 2012 at 6:10 PM, syed kather [hidden email]http://user/SendEmail.jtp?type=nodenode=3997615i=0 wrote: Hi Steve , I hope you had missed that Sep ific jar to copy into your Hadoop lib directories. Have a look on ur lib . On Jul 27, 2012 4:49 AM, Steve Armstrong [hidden email]http://user/SendEmail.jtp?type=nodenode=3997615i=1 wrote: Hello, I'm trying to trigger a Mahout job from inside my Java application (running in Eclipse), and get it running on my cluster. I have a main class that simply contains: String[] args = new String[] { --input, /input/triples.csv, --output, /output/vectors.txt, --similarityClassname, VectorSimilarityMeasures.SIMILARITY_COOCCURRENCE.toString(), --numRecommendations, 1, --tempDir, temp/ + System.currentTimeMillis() }; Configuration conf = new Configuration(); ToolRunner.run(conf, new RecommenderJob(), args); If I package the whole project up in a single jar (using Maven), copy it to the namenode, and run it with hadoop jar project.jar it works fine. But if I try and run it from my dev pc in Eclipse (where all the same dependencies are still in the classpath), and add the 3 hadoop xml files to the classpath, it triggers hadoop jobs, but they fail with errors like: 12/07/26 14:42:09 INFO mapred.JobClient: Task Id : attempt_201206261211_0173_m_01_0, Status : FAILED Error: java.lang.ClassNotFoundException: com.google.common.primitives.Longs at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) ... What I'm trying to create is a self-contained JAR that can be run from the command-line and launch the mahout job on the cluster. I've got this all working with embedded pig scripts, but I can't get it working here. Any help is appreciated, or advice on better ways to trigger the jobs from code. Thanks -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Trigger-job-from-Java-application-causes-ClassNotFound-tp3997583p3997615.html To unsubscribe from Lucene, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml - THANKS AND REGARDS, SYED ABDUL KATHER -- View this message in context: http://lucene.472066.n3.nabble.com/Trigger-job-from-Java-application-causes-ClassNotFound-tp3997583p3997686.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: Hadoop 1.0.3 start-daemon.sh doesn't start all the expected daemons
Hi Dinesh Joshi, Can you please paste your xml(core site,hdfs-site,mapred-site) and did you found any Error in your log dir .. Thanks and Regards, S SYED ABDUL KATHER On Fri, Jul 27, 2012 at 2:54 PM, Dinesh Joshi [via Lucene] ml-node+s472066n3997685...@n3.nabble.com wrote: Hi all, I installed Hadoop 1.0.3 and am running it as a single node cluster. I noticed that start-daemon.sh only starts Namenode, Secondary Namenode and the JobTracker daemon. Datanode and Tasktracker daemons are not started. However, when I start them individually they start up without any issues. I'm running this on Ubuntu 11.04 and followed the instructions here [1][2] Can someone point me to some debugging steps? Thanks. Dinesh [1] http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ [2] http://hadoop.apache.org/common/docs/stable/mapred_tutorial.html -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Hadoop-1-0-3-start-daemon-sh-doesn-t-start-all-the-expected-daemons-tp3997685.html To unsubscribe from Lucene, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml - THANKS AND REGARDS, SYED ABDUL KATHER -- View this message in context: http://lucene.472066.n3.nabble.com/Hadoop-1-0-3-start-daemon-sh-doesn-t-start-all-the-expected-daemons-tp3997685p3997688.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: Can I change hadoo.tmp.dir for each jon run without formatting
Hi abhay , As alok mentioned that's a perfect choice to override on runtime . Make sure that properties should not be set as final in configuration file . Regards Syed On Jul 26, 2012 12:16 AM, Alok Kumar [via Lucene] ml-node+s472066n3997300...@n3.nabble.com wrote: Hi Abhay, On Wed, Jul 25, 2012 at 10:44 PM, Abhay Ratnaparkhi [hidden email] http://user/SendEmail.jtp?type=nodenode=3997300i=0 wrote: hadoop.tmp.dir points to the directory on local disk to store intermediate task related data. It's currently mounted to /tmp/hadoop for me. Some of my jobs are running and Filesystem on which '/tmp' is mounted is getting full. Is it possible to change hadoop.tmp.dir parameter before submitting a new job? You can override hadoop.tmp.dir everytime before submitting your Job. I tried like this : Configuration configuration = new Configuration(); config.set(hadoop.tmp.dir, /home/user/some-other-path); Job job = new Job(config, Job1); It produced same result (I didn't format anything) Thanks -- Alok -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Can-I-change-hadoo-tmp-dir-for-each-jon-run-without-formatting-tp3997287p3997300.html To unsubscribe from Lucene, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml - THANKS AND REGARDS, SYED ABDUL KATHER -- View this message in context: http://lucene.472066.n3.nabble.com/Can-I-change-hadoo-tmp-dir-for-each-jon-run-without-formatting-tp3997287p3997762.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: incremental loads into hadoop
There is two method is there for processing OLTP 1. Hstremming or scibe these are only methodes 2. if not use chukuwa for storing the data so that when i you got a tesent volume then you can move to HDFS Thanks and Regards, S SYED ABDUL KATHER 9731841519 On Sat, Oct 1, 2011 at 4:32 AM, Sam Seigal [via Lucene] ml-node+s472066n3383949...@n3.nabble.com wrote: Hi, I am relatively new to Hadoop and was wondering how to do incremental loads into HDFS. I have a continuous stream of data flowing into a service which is writing to an OLTP store. Due to the high volume of data, we cannot do aggregations on the OLTP store, since this starts affecting the write performance. We would like to offload this processing into a Hadoop cluster, mainly for doing aggregations/analytics. The question is how can this continuous stream of data be incrementally loaded and processed into Hadoop ? Thank you, Sam -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/incremental-loads-into-hadoop-tp3383949p3383949.html To unsubscribe from Lucene, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw. - THANKS AND REGARDS, SYED ABDUL KATHER -- View this message in context: http://lucene.472066.n3.nabble.com/incremental-loads-into-hadoop-tp3383949p3385689.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.