using root LLR

2016-11-15 Thread Pat Ferrel
I understand the eyeball method but not sure users will so am working on a 
t-digest calculation of an LLR threshold. This is to maintain a certain 
sparsity at maximum “quality”. But I have a few questions.

You mention root LLR, ok but that will create negative numbers. I assume:

1) we should use the absolute value of root LLR for ranking in the max # of 
indicators sense. Seems like no value in creating the sqrt( | rootLLR | ) since 
the rank will not change but we can’t just use the value returned by the java 
root LL function directly
2) Likewise we use the absolute value of root LLR to compare with the 
threshold. Put another way without using absolute value the value passes the 
LLR threshold test if mean - threshold < value < mean + threshold
3) However the positive and negative root LLR values would be used in the 
t-digest quantile calc, which ideally would have mean = 0.

Seems simple but just checking my understanding, are these correct?


On Jan 2, 2016, at 3:17 PM, Ted Dunning  wrote:


I usually like to use a combination of a fixed threshold for llr plus a max 
number of indicators.

The fixed threshold I use is typically around 20-30 for raw LLR which 
corresponds to about 5 for root LLR. I often eyeball the lists of indicators 
for items that I understand to find a point where the list of indicators 
becomes about half noise, half useful indicators.





On Sat, Jan 2, 2016 at 2:15 PM, Pat Ferrel > wrote:
One interesting thing we saw is that like-genre was better discarded and 
dislike-genre left in the mix.

This brings up a fundamental issue with how we use LLR to downsample in Mahout. 
In this case by downsampling I mean llr(A’B), where we keep some max number of 
indicators based on the best LLR score. For the primary action—something like 
“buy”—this works well since there are usually quite a lot of items, but for B 
there may be very few items, genre is an example. Using the same max # of 
indicators for A’A as well as all the rest (A’B, etc)  means that very little 
if any downsampling based on LLR score happens for A’B. So for A’B the result 
is really more like simple cross-cooccurrence.

This seems worth addressing, if only because in our analysis the effect made 
like-genre useless, when intuition would say that it should be useful. Our 
hypothesis is that since no downsampling happened and very many of the 
reviewers preferred most all of the genres it had no differentiating value. If 
we had changed the per item max indicators to some smaller number this might 
have left only strongly correlated like-genre indicators.

Assuming I’ve got the issue correctly identified the options I can think of are:
1) use a fixed number LLR threshold for A’B or other cross-cooccurrence 
indicator. This seems pretty impractical. 
2) add a max indicators threshold param for each of the secondary indicators. 
This would be fairly easy and could be based on the # of B items. Some method 
of choosing this might end up being ~100 for A’A (the default), and a function 
of the # of items in B, C, etc. The plus is that this would be easy and keep 
the calculation at O(n) but the function that return 100 for A, and some 
smaller number for B, C, and the rest is not clear to me.
3) create a threshold based on the distribution of llr(A’B). This could be 
based on a correlation confidence (actually confidence of non-correlation for 
LLR). The down side is that this means we need to calculate all of llr(A’B) 
which approaches O(n^2) then do the downsampling of the complete llr(A’B). This 
removes the rather significant practical benefit of the current downsampling 
algorithm. Practically speaking most indicators will be of dimensionality on 
the order of # of A items or will be very very much smaller, like # of genre’s. 
So maybe calculating the distribution of llr(A’B) wouldn’t bee to bad if only 
done when B has a small number of items. In the small B case it would be O(n*m) 
where m is the number of items in B and n is the number or items in A and m << 
n so this would nearly be O(n). Also this could be mixed with #2 and only 
calculated every so often since it probably won’t change very much in any one 
application.

I guess I’d be inclined to test by trying a range of max # of indicators on our 
test data since the number of genre’s are small. If there is any place that 
produces significantly better results we could proceed to try the confidence 
method and see if it allows us to calculate the optimal #. If so them we could 
implement this for very occasional calculation on live datasets.

Any advice?

> On Dec 30, 2015, at 2:26 PM, Ted Dunning  > wrote:
> 
> 
> This is really nice work!
> 
> On Wed, Dec 30, 2015 at 11:50 AM, Pat Ferrel  > wrote:
> As many of you know Mahout-Samsara includes an interesting and 

Re: [jira] [Commented] (MAHOUT-1892) Can't broadcast vector in Mahout-Shell

2016-11-15 Thread Trevor Grant
Yes, adding `.value` has no effect.
On Nov 15, 2016 1:25 PM, "Dmitriy Lyubimov (JIRA)"  wrote:

>
> [ https://issues.apache.org/jira/browse/MAHOUT-1892?page=
> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel=15668009#comment-15668009 ]
>
> Dmitriy Lyubimov commented on MAHOUT-1892:
> --
>
> Shell is  a mystery. Obviously it tries to drag A itself into the mapblock
> closure, buy why is escaping me.
>
> What happens if we remove implicit conversion (i.e. use bcastV.value
> explicitly inside the closure)? is it still happening?
>
> > Can't broadcast vector in Mahout-Shell
> > --
> >
> > Key: MAHOUT-1892
> > URL: https://issues.apache.org/jira/browse/MAHOUT-1892
> > Project: Mahout
> >  Issue Type: Bug
> >Reporter: Trevor Grant
> >
> > When attempting to broadcast a Vector in Mahout's spark-shell with
> `mapBlock` we get serialization errors.  **NOTE** scalars can be broadcast
> without issue.
> > I did some testing in the "Zeppelin Shell" for lack of a better term.
> See https://github.com/apache/zeppelin/pull/928
> > The `mapBlock` same code I ran in the spark-shell below, also generated
> errors.  However, wrapping a mapBlock into a function in a compiled jar
> https://github.com/apache/mahout/pull/246/commits/
> ccb5da65330e394763928f6dc51d96e38debe4fb#diff-
> 4a952e8e09ae07e0b3a7ac6a5d6b2734R25 and then running said function from
> the Mahout Shell or in the "Zeppelin Shell" (using Spark or Flink as a
> runner) works fine.
> > Consider
> > ```
> > mahout> val inCoreA = dense((1, 2, 3), (3, 4, 5))
> > val A = drmParallelize(inCoreA)
> > val v: Vector = dvec(1,1,1)
> > val bcastV = drmBroadcast(v)
> > val drm2 = A.mapBlock() {
> > case (keys, block) =>
> > for(row <- 0 until block.nrow) block(row, ::) -= bcastV
> > keys -> block
> > }
> > drm2.checkpoint()
> > ```
> > Which emits the stack trace:
> > ```
> > org.apache.spark.SparkException: Task not serializable
> > at org.apache.spark.util.ClosureCleaner$.ensureSerializable(
> ClosureCleaner.scala:304)
> > at org.apache.spark.util.ClosureCleaner$.org$apache$
> spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
> > at org.apache.spark.util.ClosureCleaner$.clean(
> ClosureCleaner.scala:122)
> > at org.apache.spark.SparkContext.clean(SparkContext.scala:2032)
> > at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:318)
> > at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:317)
> > at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:147)
> > at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:108)
> > at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
> > at org.apache.spark.rdd.RDD.map(RDD.scala:317)
> > at org.apache.mahout.sparkbindings.blas.MapBlock$.
> exec(MapBlock.scala:33)
> > at org.apache.mahout.sparkbindings.SparkEngine$.
> tr2phys(SparkEngine.scala:338)
> > at org.apache.mahout.sparkbindings.SparkEngine$.
> toPhysical(SparkEngine.scala:116)
> > at org.apache.mahout.math.drm.logical.CheckpointAction.
> checkpoint(CheckpointAction.scala:41)
> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.
> (:58)
> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<
> init>(:68)
> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:70)
> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:72)
> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:74)
> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:76)
> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:78)
> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$
> $iwC$$iwC$$iwC$$iwC$$iwC.(:80)
> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$
> $iwC$$iwC$$iwC$$iwC.(:82)
> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$
> $iwC$$iwC$$iwC.(:84)
> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$
> $iwC$$iwC.(:86)
> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$
> $iwC.(:88)
> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.
> (:90)
> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<
> init>(:92)
> > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(<
> 

Re: [DISCUSS] More meaningful error when running on Spark 2.0

2016-11-15 Thread Andrew Musselman
+1

On Tue, Nov 15, 2016 at 9:23 AM, Dmitriy Lyubimov  wrote:

> +1 on version checking.
> And, there's a little bug as well. this error is technically generated by
> something like
>
> dense(Set.empty[Vector]),
>
> i.e., it cannot form a matrix out of an empty collection of vectors. While
> this is true, i suppose it needs a `require(...)` insert there to generate
> a more meaningful response instead of allowing Scala complaining about
> empty collection.
>
> -d
>
>
> On Mon, Nov 14, 2016 at 7:32 AM, Andrew Palumbo 
> wrote:
>
> > +1
> >
> >
> >
> > Sent from my Verizon Wireless 4G LTE smartphone
> >
> >
> >  Original message 
> > From: Trevor Grant 
> > Date: 11/14/2016 6:49 AM (GMT-08:00)
> > To: dev@mahout.apache.org
> > Subject: [DISCUSS] More meaningful error when running on Spark 2.0
> >
> > Hi,
> >
> > currently when running on Spark 2.0 the user will hit some sort of error,
> > one such error is:
> >
> > java.util.NoSuchElementException: next on empty iterator
> > at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
> > at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
> > at scala.collection.IndexedSeqLike$Elements.next(
> IndexedSeqLike.scala:63)
> > at scala.collection.IterableLike$class.head(IterableLike.scala:107)
> > at scala.collection.mutable.ArrayOps$ofRef.scala$collection$Ind
> > exedSeqOptimized$$super$head(ArrayOps.scala:186)
> > at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOp
> > timized.scala:126)
> > at scala.collection.mutable.ArrayOps$ofRef.head(ArrayOps.scala:186)
> > at org.apache.mahout.math.scalabindings.package$$anonfun$1.appl
> > y(package.scala:155)
> > at org.apache.mahout.math.scalabindings.package$$anonfun$1.appl
> > y(package.scala:133)
> > at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver
> > sableLike.scala:234)
> > at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver
> > sableLike.scala:234)
> > at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSe
> > qOptimized.scala:33)
> > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
> > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> > at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> > at org.apache.mahout.math.scalabindings.package$.dense(
> package.scala:133)
> > at org.apache.mahout.sparkbindings.SparkEngine$.drmSampleKRows(
> > SparkEngine.scala:289)
> > at org.apache.mahout.math.drm.package$.drmSampleKRows(package.scala:149)
> > at org.apache.mahout.math.drm.package$.drmSampleToTSV(package.scala:165)
> > ... 58 elided
> >
> > With the recent Zeppelin-Mahout integration, there are going to be a lot
> of
> > users unknowingly attempting to run on Mahout on Spark 2.0.  I think it
> > would be simple to implement yet save a lot of time on the Zeppelin and
> > Mahout mailing lists to do something like:
> >
> > if sc.version > 1.6.2 then:
> >error("Spark versions ${sc.verion} isn't supported.  Please see
> > MAHOUT-... (appropriate jira info)")
> >
> > I'd like to put something together and, depending on how many issues
> people
> > have on Zeppelin list, be prepared to do a hotfix on 0.12.2 if it becomes
> > prudent.  Everyone always complaining that Zeppelin doesn't work because
> of
> > some mystical error, is bad pr.  It DOES say in the notebook and
> elsewhere
> > that we're not 2.0 compliant, however one of the advantages/drawbacks of
> > Zeppelin is that without having to really know what you're doing you can
> > get a functional local cluster of Flink, Spark, etc. all going.
> >
> > So we easily could have a space where someone read none of the docs, and
> is
> > whining.  Surely few if any would ever do such a thing, but still I
> think a
> > prudent fix to have in the back pocket.
> >
> > tg
> >
> > Trevor Grant
> > Data Scientist
> > https://github.com/rawkintrevo
> > http://stackexchange.com/users/3002022/rawkintrevo
> > http://trevorgrant.org
> >
> > *"Fortunate is he, who is able to know the causes of things."  -Virgil*
> >
>


[jira] [Commented] (MAHOUT-1892) Can't broadcast vector in Mahout-Shell

2016-11-15 Thread Dmitriy Lyubimov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668009#comment-15668009
 ] 

Dmitriy Lyubimov commented on MAHOUT-1892:
--

Shell is  a mystery. Obviously it tries to drag A itself into the mapblock 
closure, buy why is escaping me.

What happens if we remove implicit conversion (i.e. use bcastV.value explicitly 
inside the closure)? is it still happening?

> Can't broadcast vector in Mahout-Shell
> --
>
> Key: MAHOUT-1892
> URL: https://issues.apache.org/jira/browse/MAHOUT-1892
> Project: Mahout
>  Issue Type: Bug
>Reporter: Trevor Grant
>
> When attempting to broadcast a Vector in Mahout's spark-shell with `mapBlock` 
> we get serialization errors.  **NOTE** scalars can be broadcast without issue.
> I did some testing in the "Zeppelin Shell" for lack of a better term.  See 
> https://github.com/apache/zeppelin/pull/928
> The `mapBlock` same code I ran in the spark-shell below, also generated 
> errors.  However, wrapping a mapBlock into a function in a compiled jar 
> https://github.com/apache/mahout/pull/246/commits/ccb5da65330e394763928f6dc51d96e38debe4fb#diff-4a952e8e09ae07e0b3a7ac6a5d6b2734R25
>  and then running said function from the Mahout Shell or in the "Zeppelin 
> Shell" (using Spark or Flink as a runner) works fine.  
> Consider
> ```
> mahout> val inCoreA = dense((1, 2, 3), (3, 4, 5))
> val A = drmParallelize(inCoreA)
> val v: Vector = dvec(1,1,1)
> val bcastV = drmBroadcast(v)
> val drm2 = A.mapBlock() {
> case (keys, block) =>
> for(row <- 0 until block.nrow) block(row, ::) -= bcastV
> keys -> block
> }
> drm2.checkpoint()
> ```
> Which emits the stack trace:
> ```
> org.apache.spark.SparkException: Task not serializable
> at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
> at 
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
> at 
> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
> at org.apache.spark.SparkContext.clean(SparkContext.scala:2032)
> at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:318)
> at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:317)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
> at org.apache.spark.rdd.RDD.map(RDD.scala:317)
> at 
> org.apache.mahout.sparkbindings.blas.MapBlock$.exec(MapBlock.scala:33)
> at 
> org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:338)
> at 
> org.apache.mahout.sparkbindings.SparkEngine$.toPhysical(SparkEngine.scala:116)
> at 
> org.apache.mahout.math.drm.logical.CheckpointAction.checkpoint(CheckpointAction.scala:41)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:58)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:68)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:70)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:72)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:74)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:76)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:78)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:80)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:82)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:84)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:86)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:88)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:90)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:92)
> at 
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:94)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:96)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:98)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:100)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:102)
> at 

Re: [DISCUSS] More meaningful error when running on Spark 2.0

2016-11-15 Thread Dmitriy Lyubimov
+1 on version checking.
And, there's a little bug as well. this error is technically generated by
something like

dense(Set.empty[Vector]),

i.e., it cannot form a matrix out of an empty collection of vectors. While
this is true, i suppose it needs a `require(...)` insert there to generate
a more meaningful response instead of allowing Scala complaining about
empty collection.

-d


On Mon, Nov 14, 2016 at 7:32 AM, Andrew Palumbo  wrote:

> +1
>
>
>
> Sent from my Verizon Wireless 4G LTE smartphone
>
>
>  Original message 
> From: Trevor Grant 
> Date: 11/14/2016 6:49 AM (GMT-08:00)
> To: dev@mahout.apache.org
> Subject: [DISCUSS] More meaningful error when running on Spark 2.0
>
> Hi,
>
> currently when running on Spark 2.0 the user will hit some sort of error,
> one such error is:
>
> java.util.NoSuchElementException: next on empty iterator
> at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
> at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
> at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63)
> at scala.collection.IterableLike$class.head(IterableLike.scala:107)
> at scala.collection.mutable.ArrayOps$ofRef.scala$collection$Ind
> exedSeqOptimized$$super$head(ArrayOps.scala:186)
> at scala.collection.IndexedSeqOptimized$class.head(IndexedSeqOp
> timized.scala:126)
> at scala.collection.mutable.ArrayOps$ofRef.head(ArrayOps.scala:186)
> at org.apache.mahout.math.scalabindings.package$$anonfun$1.appl
> y(package.scala:155)
> at org.apache.mahout.math.scalabindings.package$$anonfun$1.appl
> y(package.scala:133)
> at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver
> sableLike.scala:234)
> at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver
> sableLike.scala:234)
> at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSe
> qOptimized.scala:33)
> at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> at org.apache.mahout.math.scalabindings.package$.dense(package.scala:133)
> at org.apache.mahout.sparkbindings.SparkEngine$.drmSampleKRows(
> SparkEngine.scala:289)
> at org.apache.mahout.math.drm.package$.drmSampleKRows(package.scala:149)
> at org.apache.mahout.math.drm.package$.drmSampleToTSV(package.scala:165)
> ... 58 elided
>
> With the recent Zeppelin-Mahout integration, there are going to be a lot of
> users unknowingly attempting to run on Mahout on Spark 2.0.  I think it
> would be simple to implement yet save a lot of time on the Zeppelin and
> Mahout mailing lists to do something like:
>
> if sc.version > 1.6.2 then:
>error("Spark versions ${sc.verion} isn't supported.  Please see
> MAHOUT-... (appropriate jira info)")
>
> I'd like to put something together and, depending on how many issues people
> have on Zeppelin list, be prepared to do a hotfix on 0.12.2 if it becomes
> prudent.  Everyone always complaining that Zeppelin doesn't work because of
> some mystical error, is bad pr.  It DOES say in the notebook and elsewhere
> that we're not 2.0 compliant, however one of the advantages/drawbacks of
> Zeppelin is that without having to really know what you're doing you can
> get a functional local cluster of Flink, Spark, etc. all going.
>
> So we easily could have a space where someone read none of the docs, and is
> whining.  Surely few if any would ever do such a thing, but still I think a
> prudent fix to have in the back pocket.
>
> tg
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things."  -Virgil*
>