also technically all vectors should be (or expected to be) of the same
length in a valid matrix thing (doesn't mean they actually have to have all
elements -- or even all vectors, of course). So if needed, just run a
simple validation map before drmWrap to validate or to clean this up,
whichever is suitable.



On Mon, Nov 17, 2014 at 5:24 PM, Pat Ferrel <p...@occamsmachete.com> wrote:

> I do use drmWrap so I’ll check there, thanks
>
> On Nov 17, 2014, at 5:22 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
>
> On Mon, Nov 17, 2014 at 5:16 PM, Pat Ferrel <p...@occamsmachete.com> wrote:
>
> > It’s in spark-itemsimilarity. This job reads elements and assigns them to
> > one of two RDD backed drms.
> >
> > I assumed it was a badly formed drm but it’s a 140MB dataset and a bit
> > hard to nail down—just looking for a clue. I read this to say that an ID
> > for an element in a row vector was larger than drm.ncol, correct?
> >
>
> yes.
>
> and then it again comes back to the question how the matrix was
> constructed. General construction of dimensions (ncol, nrow) is
> automatic-lazy, meaning if you have not specified dimensions anywhere
> explicitly, it will lazily compute it for you. But if you did volunteer
> them anywhere (such as to drmWrap() call) they got to be good. Or you see
> things like this.
>
> >
> >
> > On Nov 17, 2014, at 4:58 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
> >
> > So this is not a problem of A'A computation -- the input is obviously
> > invalid.
> >
> > Question is what you did before you got a A handle -- read it from file?
> > parallelized it from in-core matrix (drmParallelize)? as a result of
> other
> > computation (if yes than what)? wrapped around manually crafted RDD
> > (drmWrap)?
> >
> > I don't understand the question about non-continuous ids. You are
> referring
> > to some context of your computation assuming I am in context (but i am
> > unfortunately not)
> >
> > On Mon, Nov 17, 2014 at 4:55 PM, Dmitriy Lyubimov <dlie...@gmail.com>
> > wrote:
> >
> >>
> >>
> >> On Mon, Nov 17, 2014 at 3:46 PM, Pat Ferrel <p...@occamsmachete.com>
> > wrote:
> >>
> >>> A matrix with about 4600 rows and somewhere around 27790 columns when
> >>> executing the following line from AtA (not sure of the exact
> dimensions)
> >>>
> >>>    /** The version of A'A that does not use GraphX */
> >>>    def at_a_nongraph(op: OpAtA[_], srcRdd: DrmRdd[_]): DrmRdd[Int] = {
> >>>
> >>> a vector is created whose size is causes the error. How could I have
> >>> constructed a drm that would cause this error? If the column IDs were
> >>> non-contiguous would that yield this error?
> >>>
> >>
> >> what did you do specifically to build matrix A?
> >>
> >>
> >>> ==================
> >>>
> >>> 14/11/12 17:56:03 ERROR executor.Executor: Exception in task 5.0 in
> > stage
> >>> 18.0 (TID 66169)
> >>> org.apache.mahout.math.IndexException: Index 27792 is outside allowable
> >>> range of [0,27789)
> >>>       at
> >>> org.apache.mahout.math.AbstractVector.viewPart(AbstractVector.java:147)
> >>>       at
> >>>
> org.apache.mahout.math.scalabindings.VectorOps.apply(VectorOps.scala:37)
> >>>       at
> >>>
> >
> org.apache.mahout.sparkbindings.blas.AtA$$anonfun$5$$anonfun$apply$6.apply(AtA.scala:152)
> >>>       at
> >>>
> >
> org.apache.mahout.sparkbindings.blas.AtA$$anonfun$5$$anonfun$apply$6.apply(AtA.scala:149)
> >>>       at
> >>>
> scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
> >>>       at
> >>>
> scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
> >>>       at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1085)
> >>>       at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1077)
> >>>       at
> >>>
> >
> scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.scala:980)
> >>>       at
> >>>
> >
> scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.scala:980)
> >>>       at
> >>>
> >
> scala.collection.immutable.StreamIterator$LazyCell.v$lzycompute(Stream.scala:969)
> >>>       at
> >>> scala.collection.immutable.StreamIterator$LazyCell.v(Stream.scala:969)
> >>>       at
> >>> scala.collection.immutable.StreamIterator.hasNext(Stream.scala:974)
> >>>       at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
> >>>       at
> >>>
> >
> org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:137)
> >>>       at
> >>> org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
> >>>       at
> >>>
> >
> org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:55)
> >>>       at
> >>>
> >
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> >>>       at
> >>>
> >
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> >>>       at org.apache.spark.scheduler.Task.run(Task.scala:54)
> >>>       at
> >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
> >>>       at
> >>>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> >>>       at
> >>>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> >>>       at java.lang.Thread.run(Thread.java:695)
> >>>
> >>>
> >>
> >
> >
>
>

Reply via email to