[
https://issues.apache.org/jira/browse/SPARK-11439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992828#comment-14992828
]
Nakul Jindal commented on SPARK-11439:
--------------------------------------
I seem to be running into a problem.
1.
[This|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala#L124-L165]
is the current implementation.
2. [This|https://gist.github.com/nakul02/9341a9ed67cd192d98df] is the
implementation that I tried first (and it passed all tests).
3. [This|https://gist.github.com/nakul02/4f5392c7d5997871da7b] is an improved
implementation that doesn't form the "x" array, but it fails tests in suites -
* org.apache.spark.ml.regression.LinearRegressionSuite
* org.apache.spark.ml.evaluation.RegressionEvaluatorSuite
The difference between 2 and 3 is the way in which the random number generator
is used. Could this possibly cause the tests to fail? Maybe I am doing
something obviously stupid here.
This is frustrating and any insight would help!
> Optiomization of creating sparse feature without dense one
> ----------------------------------------------------------
>
> Key: SPARK-11439
> URL: https://issues.apache.org/jira/browse/SPARK-11439
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Reporter: Kai Sasaki
> Priority: Minor
>
> Currently, sparse feature generated in {{LinearDataGenerator}} needs to
> create dense vectors once. It is cost efficient to prevent from generating
> dense feature when creating sparse features.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]