Sorry for missing that detail, my bad.

Personally I do not see why we cannot have a (2^31) bytes array, which
would not solve my problem anyway.
I guess at this point I'll have to go for multiple ExtendedDataOutput per
vertex, and the EdgeIterator just "concatenates" them?

I do not want to go into the details, but I want a number of messages that
is the same as the edges in the graph.


On Thu, Feb 21, 2013 at 4:22 PM, Maja Kabiljo <[email protected]> wrote:

> Yeah, that's exactly what I told you.
> We have one ExtendedDataOutput per vertex. And the limit is about
> (2^30)/4, since you will have double of the size allocated at some point.
> What do you mean you don't want to use a combiner for evaluation reasons?
>
> On 2/21/13 5:38 AM, "Claudio Martella" <[email protected]> wrote:
>
> >Yep. Actually more than a supernode sending messages it looks more like a
> >supernode receiving a lot of messages, and hence filling the inbox queue
> >(in bytearrayformat).
> >I assumed a max (2^31)/4 (for float messages) limit because I assumed we
> >had such an object per vertex, but it looks like the ByteArrayOutputFormat
> >is used for more than a single vertex.
> >Maja, do you think it could make sense to use one array per vertex (I
> >haven't looked deep at the code, but I assume that we have one per
> >partition now)?
> >
> >
> >
> >On Thu, Feb 21, 2013 at 12:49 AM, Eli Reisman
> ><[email protected]>wrote:
> >
> >> Sorry didn't see the bottom of this thread before! I like Maja's idea,
> >>some
> >> mega-supernode (thats a technical term) just sent that vertex so many
> >> messages that the byte buffer size overflowed on the right shift. I saw
> >> this problem in some other forms last summer. The number of other ways
> >>that
> >> number could go negative seem limited.
> >>
> >>
> >> On Wed, Feb 20, 2013 at 3:41 PM, Eli Reisman <[email protected]
> >> >wrote:
> >>
> >> > I do know when you run out of memory it often looks like the logs are
> >> > fine, or seem to end in a wierd spot, but really the last X number of
> >>log
> >> > messages just doesn't get logged. Sometimes its worth looking at the
> >> worker
> >> > logs individually and one in every group of them will actually get a
> >> > sensible exception message logged. Don't know if thats the case
> >>though. I
> >> > would tend to agree with your intuition regarding the queues and
> >> > concurrency.
> >> >
> >> > On Tue, Feb 19, 2013 at 3:04 PM, Claudio Martella <
> >> > [email protected]> wrote:
> >> >
> >> >> Looking at the second line of the logs, it does not look like the
> >>case,
> >> >> but
> >> >> it could be. do you have any suggestion for the flags in particular?
> >>I'm
> >> >> running with -XX:+HeapDumpOnOutOfMemoryError  now, and I don't get
> >>any
> >> >> dump.
> >> >>
> >> >> It could be the case, but the behavior is quite consistent currently,
> >> I'd
> >> >> get a Child error/OOM pretty soon in these cases.
> >> >>
> >> >>
> >> >> On Tue, Feb 19, 2013 at 11:45 PM, Avery Ching <[email protected]>
> >> wrote:
> >> >>
> >> >> > Can you add GC flags?  Perhaps you are running into full GCs...
> >> >> >
> >> >> >
> >> >> > On 2/19/13 1:45 PM, Claudio Martella wrote:
> >> >> >
> >> >> >> I have not investigated further yet, but this does not look
> >>connected
> >> >> to
> >> >> >> ZK, but mostly to some concurrency on the queues. The number of
> >> >> requests
> >> >> >> stays to 29...
> >> >> >>
> >> >> >>
> >> >> >> On Tue, Feb 19, 2013 at 9:58 PM, Sebastian Schelter
> >><[email protected]>
> >> >> >> wrote:
> >> >> >>
> >> >> >>  For me, Giraph on Hadoop 1.0.4 runs into some kind of deadlock
> >>after
> >> >> the
> >> >> >>> input was loaded when using it with 200 workers on a
> >>billion-scale
> >> >> >>> graph....
> >> >> >>>
> >> >> >>> I tried several versions of zookeeper, but nothing helped.
> >> >> >>>
> >> >> >>> Best,
> >> >> >>> Sebastian
> >> >> >>>
> >> >> >>>
> >> >> >>> On 19.02.2013 21:55, Claudio Martella wrote:
> >> >> >>>
> >> >> >>>> I'm running PR with trunk on yahoo webgraph across 528 workers
> >>and
> >> I
> >> >> get
> >> >> >>>> the workers hanging at the flushing:
> >> >> >>>>
> >> >> >>>> 2013-02-19 21:40:07,187 INFO
> >> >> org.apache.giraph.graph.**ComputeCallable:
> >> >> >>>>
> >> >> >>> call:
> >> >> >>>
> >> >> >>>> Computation took 28.386961 secs for 95 partitions on superstep
> >>0.
> >> >> >>>>
> >> >> >>>   Flushing
> >> >> >>>
> >> >> >>>> started
> >> >> >>>> 2013-02-19 21:40:07,604 INFO org.apache.giraph.worker.**
> >> >> >>>> BspServiceWorker:
> >> >> >>>> finishSuperstep: Waiting on all requests, superstep 0 Memory
> >> >> >>>> (free/total/max) = 5993.42M / 8661.44M / 14346.69M
> >> >> >>>> 2013-02-19 21:40:07,605 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 558 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 6.4758, MBytes/sec
> >> >> received
> >> >> >>>>
> >> >> >>> =
> >> >> >>>
> >> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0003, ave sent req
> >> >> MBytes =
> >> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 27.083
> >> >> >>>> 2013-02-19 21:40:26,717 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 39 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 3.7965, MBytes/sec
> >> >> received
> >> >> >>>>
> >> >> >>> =
> >> >> >>>
> >> >> >>>> 0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0068, ave sent
> >> req
> >> >> >>>>
> >> >> >>> MBytes
> >> >> >>>
> >> >> >>>> = 0.2998, ave received req MBytes = 0, secs waited = 46.195
> >> >> >>>> 2013-02-19 21:40:32,248 INFO
> >> >> >>>> org.apache.giraph.comm.netty.**handler.RequestDecoder: decode:
> >> Server
> >> >> >>>>
> >> >> >>> window
> >> >> >>>
> >> >> >>>> metrics MBytes/sec sent = 0.0002, MBytes/sec received = 3.9824,
> >> >> >>>>
> >> >> >>> MBytesSent
> >> >> >>>
> >> >> >>>> = 0.0066, MBytesReceived = 155.4787, ave sent req MBytes = 0,
> >>ave
> >> >> >>>>
> >> >> >>> received
> >> >> >>>
> >> >> >>>> req MBytes = 0.0155, secs waited = 39.04
> >> >> >>>> 2013-02-19 21:40:42,041 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 36 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 2.8509, MBytes/sec
> >> >> received
> >> >> >>>>
> >> >> >>> =
> >> >> >>>
> >> >> >>>> 0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0068, ave sent
> >> req
> >> >> >>>>
> >> >> >>> MBytes
> >> >> >>>
> >> >> >>>> = 0.2998, ave received req MBytes = 0, secs waited = 61.519
> >> >> >>>> 2013-02-19 21:41:07,456 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 2.0174, MBytes/sec
> >> >> received
> >> >> >>>>
> >> >> >>> =
> >> >> >>>
> >> >> >>>> 0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent
> >> req
> >> >> >>>>
> >> >> >>> MBytes
> >> >> >>>
> >> >> >>>> = 0.2998, ave received req MBytes = 0, secs waited = 86.933
> >> >> >>>> 2013-02-19 21:41:22,456 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 1.7205, MBytes/sec
> >> >> received
> >> >> >>>>
> >> >> >>> =
> >> >> >>>
> >> >> >>>> 0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent
> >> req
> >> >> >>>>
> >> >> >>> MBytes
> >> >> >>>
> >> >> >>>> = 0.2998, ave received req MBytes = 0, secs waited = 101.934
> >> >> >>>> 2013-02-19 21:41:37,457 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 1.4998, MBytes/sec
> >> >> received
> >> >> >>>>
> >> >> >>> =
> >> >> >>>
> >> >> >>>> 0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent
> >> req
> >> >> >>>>
> >> >> >>> MBytes
> >> >> >>>
> >> >> >>>> = 0.2998, ave received req MBytes = 0, secs waited = 116.935
> >> >> >>>> 2013-02-19 21:41:52,458 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 1.3293, MBytes/sec
> >> >> received
> >> >> >>>>
> >> >> >>> =
> >> >> >>>
> >> >> >>>> 0.0001, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent
> >> req
> >> >> >>>>
> >> >> >>> MBytes
> >> >> >>>
> >> >> >>>> = 0.2998, ave received req MBytes = 0, secs waited = 131.936
> >> >> >>>> 2013-02-19 21:42:22,459 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 1.083, MBytes/sec
> >> >> received
> >> >> >>>> =
> >> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
> >> >> MBytes =
> >> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 161.937
> >> >> >>>> 2013-02-19 21:42:37,460 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.9912, MBytes/sec
> >> >> received
> >> >> >>>>
> >> >> >>> =
> >> >> >>>
> >> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
> >> >> MBytes =
> >> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 176.938
> >> >> >>>> 2013-02-19 21:43:07,461 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.8475, MBytes/sec
> >> >> received
> >> >> >>>>
> >> >> >>> =
> >> >> >>>
> >> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
> >> >> MBytes =
> >> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 206.939
> >> >> >>>> 2013-02-19 21:43:37,462 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.7402, MBytes/sec
> >> >> received
> >> >> >>>>
> >> >> >>> =
> >> >> >>>
> >> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
> >> >> MBytes =
> >> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 236.94
> >> >> >>>> 2013-02-19 21:43:52,463 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.6961, MBytes/sec
> >> >> received
> >> >> >>>>
> >> >> >>> =
> >> >> >>>
> >> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
> >> >> MBytes =
> >> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 251.941
> >> >> >>>> 2013-02-19 21:44:07,464 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.657, MBytes/sec
> >> >> received
> >> >> >>>> =
> >> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
> >> >> MBytes =
> >> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 266.942
> >> >> >>>> 2013-02-19 21:44:37,465 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.5906, MBytes/sec
> >> >> received
> >> >> >>>>
> >> >> >>> =
> >> >> >>>
> >> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
> >> >> MBytes =
> >> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 296.943
> >> >> >>>> 2013-02-19 21:45:07,466 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.5364, MBytes/sec
> >> >> received
> >> >> >>>>
> >> >> >>> =
> >> >> >>>
> >> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
> >> >> MBytes =
> >> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 326.944
> >> >> >>>> 2013-02-19 21:45:22,467 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.5129, MBytes/sec
> >> >> received
> >> >> >>>>
> >> >> >>> =
> >> >> >>>
> >> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
> >> >> MBytes =
> >> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 341.945
> >> >> >>>> 2013-02-19 21:45:37,468 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.4913, MBytes/sec
> >> >> received
> >> >> >>>>
> >> >> >>> =
> >> >> >>>
> >> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
> >> >> MBytes =
> >> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 356.946
> >> >> >>>> 2013-02-19 21:46:07,469 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.4532, MBytes/sec
> >> >> received
> >> >> >>>>
> >> >> >>> =
> >> >> >>>
> >> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
> >> >> MBytes =
> >> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 386.947
> >> >> >>>> 2013-02-19 21:46:22,470 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.4363, MBytes/sec
> >> >> received
> >> >> >>>>
> >> >> >>> =
> >> >> >>>
> >> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
> >> >> MBytes =
> >> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 401.948
> >> >> >>>> 2013-02-19 21:46:52,471 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.406, MBytes/sec
> >> >> received
> >> >> >>>> =
> >> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
> >> >> MBytes =
> >> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 431.949
> >> >> >>>> 2013-02-19 21:47:07,472 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.3924, MBytes/sec
> >> >> received
> >> >> >>>>
> >> >> >>> =
> >> >> >>>
> >> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
> >> >> MBytes =
> >> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 446.95
> >> >> >>>> 2013-02-19 21:47:37,473 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.3677, MBytes/sec
> >> >> received
> >> >> >>>>
> >> >> >>> =
> >> >> >>>
> >> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
> >> >> MBytes =
> >> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 476.951
> >> >> >>>> 2013-02-19 21:48:07,474 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.346, MBytes/sec
> >> >> received
> >> >> >>>> =
> >> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
> >> >> MBytes =
> >> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 506.952
> >> >> >>>> 2013-02-19 21:48:22,475 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.336, MBytes/sec
> >> >> received
> >> >> >>>> =
> >> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
> >> >> MBytes =
> >> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 521.953
> >> >> >>>> 2013-02-19 21:48:52,476 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.3177, MBytes/sec
> >> >> received
> >> >> >>>>
> >> >> >>> =
> >> >> >>>
> >> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
> >> >> MBytes =
> >> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 551.954
> >> >> >>>> 2013-02-19 21:49:07,477 INFO org.apache.giraph.comm.netty.**
> >> >> >>>> NettyClient:
> >> >> >>>> waitSomeRequests: Waiting interval of 15000 msecs, 29 open
> >> requests,
> >> >> >>>> waiting for it to be <= 0, MBytes/sec sent = 0.3093, MBytes/sec
> >> >> received
> >> >> >>>>
> >> >> >>> =
> >> >> >>>
> >> >> >>>> 0, MBytesSent = 175.3834, MBytesReceived = 0.0069, ave sent req
> >> >> MBytes =
> >> >> >>>> 0.2998, ave received req MBytes = 0, secs waited = 566.955
> >> >> >>>>
> >> >> >>>> It looks similar to the problem we had earlier with concurrency.
> >> >> >>>>
> >> >> >>>> Are you guys experiencing something similar at large-scale?
> >> >> >>>>
> >> >> >>>>
> >> >> >>>
> >> >> >>
> >> >> >
> >> >>
> >> >>
> >> >> --
> >> >>    Claudio Martella
> >> >>    [email protected]
> >> >>
> >> >
> >> >
> >>
> >
> >
> >
> >--
> >   Claudio Martella
> >   [email protected]
>
>


-- 
   Claudio Martella
   [email protected]

Reply via email to