Yes, first job is done to distribute the slices of input data among BSP processors.
On Sat, Apr 27, 2013 at 1:58 AM, Leonidas Fegaras <[email protected]> wrote: > OK. MRQL works fine now with Hama 0.7.0 in distributed mode. > I haven't tested it on a real cluster yet. > I am attaching the output from pagerank. > By the way, Hama 0.7.0 runs 2 jobs for each BSPjob, although the first is > fast. > Is this done to distribute the data among peers? > Leonidas > > 13/04/26 10:13:50 INFO mortbay.log: Logging to > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via > org.mortbay.log.Slf4jLog > *** Using 8 BSP tasks (out of a max 8). Each task will handle about 2525538 > bytes of input data. > 13/04/26 10:13:50 INFO bsp.FileInputFormat: Total input paths to process : 1 > 13/04/26 10:13:50 INFO bsp.FileInputFormat: Total input paths to process : 1 > 13/04/26 10:13:50 INFO bsp.BSPJobClient: Running job: job_201304260948_0020 > 13/04/26 10:13:53 INFO bsp.BSPJobClient: Current supersteps number: 0 > 13/04/26 10:14:02 INFO bsp.BSPJobClient: Current supersteps number: 2 > 13/04/26 10:14:05 INFO bsp.BSPJobClient: The total number of supersteps: 2 > 13/04/26 10:14:05 INFO bsp.BSPJobClient: Counters: 6 > 13/04/26 10:14:05 INFO bsp.BSPJobClient: > org.apache.hama.bsp.JobInProgress$JobCounter > 13/04/26 10:14:05 INFO bsp.BSPJobClient: SUPERSTEPS=2 > 13/04/26 10:14:05 INFO bsp.BSPJobClient: LAUNCHED_TASKS=1 > 13/04/26 10:14:05 INFO bsp.BSPJobClient: > org.apache.hama.bsp.BSPPeerImpl$PeerCounter > 13/04/26 10:14:05 INFO bsp.BSPJobClient: SUPERSTEP_SUM=2 > 13/04/26 10:14:05 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=178 > 13/04/26 10:14:05 INFO bsp.BSPJobClient: IO_BYTES_READ=20204222 > 13/04/26 10:14:05 INFO bsp.BSPJobClient: TASK_INPUT_RECORDS=918362 > 13/04/26 10:14:05 INFO bsp.FileInputFormat: Total input paths to process : 8 > 13/04/26 10:14:06 INFO bsp.BSPJobClient: Running job: job_201304260948_0019 > 13/04/26 10:14:09 INFO bsp.BSPJobClient: Current supersteps number: 0 > 13/04/26 10:14:18 INFO bsp.BSPJobClient: Current supersteps number: 2 > 13/04/26 10:14:30 INFO bsp.BSPJobClient: Current supersteps number: 3 > 13/04/26 10:14:33 INFO bsp.BSPJobClient: Current supersteps number: 4 > 13/04/26 10:14:36 INFO bsp.BSPJobClient: Current supersteps number: 5 > 13/04/26 10:14:42 INFO bsp.BSPJobClient: Current supersteps number: 6 > 13/04/26 10:14:45 INFO bsp.BSPJobClient: Current supersteps number: 8 > 13/04/26 10:14:54 INFO bsp.BSPJobClient: Current supersteps number: 11 > 13/04/26 10:15:03 INFO bsp.BSPJobClient: Current supersteps number: 14 > 13/04/26 10:15:12 INFO bsp.BSPJobClient: Current supersteps number: 18 > 13/04/26 10:15:15 INFO bsp.BSPJobClient: Current supersteps number: 19 > 13/04/26 10:15:15 INFO bsp.BSPJobClient: The total number of supersteps: 19 > 13/04/26 10:15:15 INFO bsp.BSPJobClient: Counters: 9 > 13/04/26 10:15:15 INFO bsp.BSPJobClient: > org.apache.hama.bsp.JobInProgress$JobCounter > 13/04/26 10:15:15 INFO bsp.BSPJobClient: SUPERSTEPS=19 > 13/04/26 10:15:15 INFO bsp.BSPJobClient: LAUNCHED_TASKS=8 > 13/04/26 10:15:15 INFO bsp.BSPJobClient: > org.apache.hama.bsp.BSPPeerImpl$PeerCounter > 13/04/26 10:15:15 INFO bsp.BSPJobClient: SUPERSTEP_SUM=152 > 13/04/26 10:15:15 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=132721 > 13/04/26 10:15:15 INFO bsp.BSPJobClient: IO_BYTES_READ=22986388 > 13/04/26 10:15:15 INFO bsp.BSPJobClient: TOTAL_MESSAGES_SENT=5694804 > 13/04/26 10:15:15 INFO bsp.BSPJobClient: TASK_INPUT_RECORDS=918362 > 13/04/26 10:15:15 INFO bsp.BSPJobClient: COMPRESSED_MESSAGES=8 > 13/04/26 10:15:15 INFO bsp.BSPJobClient: TOTAL_MESSAGES_RECEIVED=5694804 > > > > > > On 04/25/2013 08:05 PM, Edward J. Yoon wrote: >> >> Oh.. thanks. Here's another snapshot: >> >> http://people.apache.org/~edwardyoon/dist/0.7.0-SNAPSHOT/hama-0.7.0-SNAPSHOT2.tar.gz >> >> I've tested successfully on my laptop. Can you please test one more >> time with this? >> >> On Fri, Apr 26, 2013 at 1:44 AM, Leonidas Fegaras <[email protected]> >> wrote: >>> >>> OK. I tested it on my 8-core laptop. It seems that the problem with comma >>> separated HDFS files in distributed mode has not been fixed yet: >>> >>> >>> FileInputFormat.setInputPaths(job,"hdfs://localhost:9000/user/fegaras/tests/data/orders.tbl,hdfs://localhost:9000/user/fegaras/tests/data/customer.tbl"); >>> >>> >>> I get the error: >>> java.net.URISyntaxException: Relative path in absolute URI: >>> localhost:9000 >>> >>> So I can't do joins. >>> Queries that work on a single input file work fine in distributed mode. >>> Their runtime on my laptop is comparable to that of Hama 0.5.0. >>> Leonidas >>> >>> >>> >>> On 04/24/2013 03:25 AM, Edward J. Yoon wrote: >>>> >>>> Leonidas, >>>> >>>> Could you please test with >>>> http://people.apache.org/~edwardyoon/dist/0.7.0-SNAPSHOT/ and feedback >>>> me? >>>> >>>> On Tue, Apr 23, 2013 at 11:07 PM, Leonidas Fegaras <[email protected]> >>>> wrote: >>>>> >>>>> Yes, I think this is fine. I can test a pre-release of Hama 0.6.2 to >>>>> make >>>>> sure that works well with MRQL. >>>>> I have also extended the MRQL make/ant files to work with Yarn. They >>>>> will >>>>> be >>>>> part of the next patch. I have tested MRQL on Yarn in local mode only >>>>> because I don't have access to a Yarn cluster. >>>>> Leonidas >>>>> >>>>> >>>>> >>>>> On Apr 22, 2013, at 6:10 PM, Edward J. Yoon wrote: >>>>> >>>>>> Since Hama 0.6 version is more memory efficient than the old version, >>>>>> let's try to release based on Hama 0.6.* version. I want to evaluate >>>>>> MRQL's both MR version and BSP version, with large data sets on my >>>>>> cluster. I'll fix that problem soon and release Hama 0.6.2. What do >>>>>> you think? >>>>>> >>>>>> On Thu, Apr 18, 2013 at 6:22 AM, Edward J. Yoon >>>>>> <[email protected]> >>>>>> wrote: >>>>>>> >>>>>>> +1 >>>>>>> >>>>>>> On Thu, Apr 18, 2013 at 12:12 AM, Leonidas Fegaras >>>>>>> <[email protected]> >>>>>>> wrote: >>>>>>>> >>>>>>>> Edward, >>>>>>>> Unfortunately, the current MRQL doesn't work correctly with Hama >>>>>>>> 0.6.x. >>>>>>>> It >>>>>>>> works fine with Hama 0.5.0. >>>>>>>> (The splits generated by the FileInputFormat in Hama 0.6.0 cannot be >>>>>>>> smaller >>>>>>>> than a block, while Hama 0.6.1 doesn't work correctly with comma >>>>>>>> separated >>>>>>>> paths, which prevents joins). >>>>>>>> We can wait for the next Hama release (date?) or we can just release >>>>>>>> it >>>>>>>> as >>>>>>>> is for Hama 0.5.0. >>>>>>>> In either case, let's put a tentative release date: May 15, so we >>>>>>>> will >>>>>>>> have >>>>>>>> one month to write all guides and to setup a testbed. >>>>>>>> Do you agree to have our first release on May 15? >>>>>>>> Leonidas >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Apr 17, 2013, at 2:55 AM, Edward J. Yoon wrote: >>>>>>>> >>>>>>>>> I personally would recommend you release a first Apache MRQL (with >>>>>>>>> a >>>>>>>>> well-described guide on how to get started or involved) that works >>>>>>>>> with open source Apache Hadoop 1.0 and Hama 0.6.x. >>>>>>>>> >>>>>>>>> On Sat, Apr 13, 2013 at 12:38 AM, Leonidas Fegaras >>>>>>>>> <[email protected]> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I think the obvious person to manage the first release is me, if >>>>>>>>>> there >>>>>>>>>> is >>>>>>>>>> no >>>>>>>>>> other volunteer. >>>>>>>>>> I don't have any experience with release plans. Do we need to >>>>>>>>>> setup >>>>>>>>>> a >>>>>>>>>> timeline for future releases? >>>>>>>>>> Maybe we should develop a testbed first to be run on different >>>>>>>>>> cluster >>>>>>>>>> sizes >>>>>>>>>> before each official release. >>>>>>>>>> Leonidas >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Apr 11, 2013, at 8:42 PM, Edward J. Yoon wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> What are our plans for our first release under ASF? And who is >>>>>>>>>>> going >>>>>>>>>>> to do the release managing? >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Best Regards, Edward J. Yoon >>>>>>>>>>> @eddieyoon >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best Regards, Edward J. Yoon >>>>>>>>> @eddieyoon >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best Regards, Edward J. Yoon >>>>>>> @eddieyoon >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best Regards, Edward J. Yoon >>>>>> @eddieyoon >>>>> >>>>> >>>> >> >> > -- Best Regards, Edward J. Yoon @eddieyoon
