OK. MRQL works fine now with Hama 0.7.0 in distributed mode.
I haven't tested it on a real cluster yet.
I am attaching the output from pagerank.
By the way, Hama 0.7.0 runs 2 jobs for each BSPjob, although the first
is fast.
Is this done to distribute the data among peers?
Leonidas
13/04/26 10:13:50 INFO mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog
*** Using 8 BSP tasks (out of a max 8). Each task will handle about
2525538 bytes of input data.
13/04/26 10:13:50 INFO bsp.FileInputFormat: Total input paths to process : 1
13/04/26 10:13:50 INFO bsp.FileInputFormat: Total input paths to process : 1
13/04/26 10:13:50 INFO bsp.BSPJobClient: Running job: job_201304260948_0020
13/04/26 10:13:53 INFO bsp.BSPJobClient: Current supersteps number: 0
13/04/26 10:14:02 INFO bsp.BSPJobClient: Current supersteps number: 2
13/04/26 10:14:05 INFO bsp.BSPJobClient: The total number of supersteps: 2
13/04/26 10:14:05 INFO bsp.BSPJobClient: Counters: 6
13/04/26 10:14:05 INFO bsp.BSPJobClient:
org.apache.hama.bsp.JobInProgress$JobCounter
13/04/26 10:14:05 INFO bsp.BSPJobClient: SUPERSTEPS=2
13/04/26 10:14:05 INFO bsp.BSPJobClient: LAUNCHED_TASKS=1
13/04/26 10:14:05 INFO bsp.BSPJobClient:
org.apache.hama.bsp.BSPPeerImpl$PeerCounter
13/04/26 10:14:05 INFO bsp.BSPJobClient: SUPERSTEP_SUM=2
13/04/26 10:14:05 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=178
13/04/26 10:14:05 INFO bsp.BSPJobClient: IO_BYTES_READ=20204222
13/04/26 10:14:05 INFO bsp.BSPJobClient: TASK_INPUT_RECORDS=918362
13/04/26 10:14:05 INFO bsp.FileInputFormat: Total input paths to process : 8
13/04/26 10:14:06 INFO bsp.BSPJobClient: Running job: job_201304260948_0019
13/04/26 10:14:09 INFO bsp.BSPJobClient: Current supersteps number: 0
13/04/26 10:14:18 INFO bsp.BSPJobClient: Current supersteps number: 2
13/04/26 10:14:30 INFO bsp.BSPJobClient: Current supersteps number: 3
13/04/26 10:14:33 INFO bsp.BSPJobClient: Current supersteps number: 4
13/04/26 10:14:36 INFO bsp.BSPJobClient: Current supersteps number: 5
13/04/26 10:14:42 INFO bsp.BSPJobClient: Current supersteps number: 6
13/04/26 10:14:45 INFO bsp.BSPJobClient: Current supersteps number: 8
13/04/26 10:14:54 INFO bsp.BSPJobClient: Current supersteps number: 11
13/04/26 10:15:03 INFO bsp.BSPJobClient: Current supersteps number: 14
13/04/26 10:15:12 INFO bsp.BSPJobClient: Current supersteps number: 18
13/04/26 10:15:15 INFO bsp.BSPJobClient: Current supersteps number: 19
13/04/26 10:15:15 INFO bsp.BSPJobClient: The total number of supersteps: 19
13/04/26 10:15:15 INFO bsp.BSPJobClient: Counters: 9
13/04/26 10:15:15 INFO bsp.BSPJobClient:
org.apache.hama.bsp.JobInProgress$JobCounter
13/04/26 10:15:15 INFO bsp.BSPJobClient: SUPERSTEPS=19
13/04/26 10:15:15 INFO bsp.BSPJobClient: LAUNCHED_TASKS=8
13/04/26 10:15:15 INFO bsp.BSPJobClient:
org.apache.hama.bsp.BSPPeerImpl$PeerCounter
13/04/26 10:15:15 INFO bsp.BSPJobClient: SUPERSTEP_SUM=152
13/04/26 10:15:15 INFO bsp.BSPJobClient: TIME_IN_SYNC_MS=132721
13/04/26 10:15:15 INFO bsp.BSPJobClient: IO_BYTES_READ=22986388
13/04/26 10:15:15 INFO bsp.BSPJobClient: TOTAL_MESSAGES_SENT=5694804
13/04/26 10:15:15 INFO bsp.BSPJobClient: TASK_INPUT_RECORDS=918362
13/04/26 10:15:15 INFO bsp.BSPJobClient: COMPRESSED_MESSAGES=8
13/04/26 10:15:15 INFO bsp.BSPJobClient: TOTAL_MESSAGES_RECEIVED=5694804
On 04/25/2013 08:05 PM, Edward J. Yoon wrote:
Oh.. thanks. Here's another snapshot:
http://people.apache.org/~edwardyoon/dist/0.7.0-SNAPSHOT/hama-0.7.0-SNAPSHOT2.tar.gz
I've tested successfully on my laptop. Can you please test one more
time with this?
On Fri, Apr 26, 2013 at 1:44 AM, Leonidas Fegaras <[email protected]> wrote:
OK. I tested it on my 8-core laptop. It seems that the problem with comma
separated HDFS files in distributed mode has not been fixed yet:
FileInputFormat.setInputPaths(job,"hdfs://localhost:9000/user/fegaras/tests/data/orders.tbl,hdfs://localhost:9000/user/fegaras/tests/data/customer.tbl");
I get the error:
java.net.URISyntaxException: Relative path in absolute URI: localhost:9000
So I can't do joins.
Queries that work on a single input file work fine in distributed mode.
Their runtime on my laptop is comparable to that of Hama 0.5.0.
Leonidas
On 04/24/2013 03:25 AM, Edward J. Yoon wrote:
Leonidas,
Could you please test with
http://people.apache.org/~edwardyoon/dist/0.7.0-SNAPSHOT/ and feedback
me?
On Tue, Apr 23, 2013 at 11:07 PM, Leonidas Fegaras <[email protected]>
wrote:
Yes, I think this is fine. I can test a pre-release of Hama 0.6.2 to make
sure that works well with MRQL.
I have also extended the MRQL make/ant files to work with Yarn. They will
be
part of the next patch. I have tested MRQL on Yarn in local mode only
because I don't have access to a Yarn cluster.
Leonidas
On Apr 22, 2013, at 6:10 PM, Edward J. Yoon wrote:
Since Hama 0.6 version is more memory efficient than the old version,
let's try to release based on Hama 0.6.* version. I want to evaluate
MRQL's both MR version and BSP version, with large data sets on my
cluster. I'll fix that problem soon and release Hama 0.6.2. What do
you think?
On Thu, Apr 18, 2013 at 6:22 AM, Edward J. Yoon <[email protected]>
wrote:
+1
On Thu, Apr 18, 2013 at 12:12 AM, Leonidas Fegaras
<[email protected]>
wrote:
Edward,
Unfortunately, the current MRQL doesn't work correctly with Hama
0.6.x.
It
works fine with Hama 0.5.0.
(The splits generated by the FileInputFormat in Hama 0.6.0 cannot be
smaller
than a block, while Hama 0.6.1 doesn't work correctly with comma
separated
paths, which prevents joins).
We can wait for the next Hama release (date?) or we can just release
it
as
is for Hama 0.5.0.
In either case, let's put a tentative release date: May 15, so we
will
have
one month to write all guides and to setup a testbed.
Do you agree to have our first release on May 15?
Leonidas
On Apr 17, 2013, at 2:55 AM, Edward J. Yoon wrote:
I personally would recommend you release a first Apache MRQL (with a
well-described guide on how to get started or involved) that works
with open source Apache Hadoop 1.0 and Hama 0.6.x.
On Sat, Apr 13, 2013 at 12:38 AM, Leonidas Fegaras
<[email protected]>
wrote:
I think the obvious person to manage the first release is me, if
there
is
no
other volunteer.
I don't have any experience with release plans. Do we need to setup
a
timeline for future releases?
Maybe we should develop a testbed first to be run on different
cluster
sizes
before each official release.
Leonidas
On Apr 11, 2013, at 8:42 PM, Edward J. Yoon wrote:
Hi all,
What are our plans for our first release under ASF? And who is
going
to do the release managing?
--
Best Regards, Edward J. Yoon
@eddieyoon
--
Best Regards, Edward J. Yoon
@eddieyoon
--
Best Regards, Edward J. Yoon
@eddieyoon
--
Best Regards, Edward J. Yoon
@eddieyoon