[
https://issues.apache.org/jira/browse/SPARK-18274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-18274:
---
Fix Version/s: (was: 2.1.1)
2.1.0
> Memory leak in PySp
[
https://issues.apache.org/jira/browse/SPARK-18318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731415#comment-15731415
]
Nick Pentreath commented on SPARK-18318:
Went ahead and re-marked fix version to {{2.1.0}} since
[
https://issues.apache.org/jira/browse/SPARK-18319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731413#comment-15731413
]
Nick Pentreath commented on SPARK-18319:
Went ahead and re-marked fix version to {{2.1.0}} since
[
https://issues.apache.org/jira/browse/SPARK-18319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-18319:
---
Fix Version/s: (was: 2.2.0)
2.1.0
> ML, Graph 2.1 QA:
[
https://issues.apache.org/jira/browse/SPARK-18319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-18319:
---
Fix Version/s: (was: 2.1.1)
> ML, Graph 2.1 QA: API: Experimental, DeveloperApi, fi
[
https://issues.apache.org/jira/browse/SPARK-18318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-18318:
---
Fix Version/s: (was: 2.1.1)
(was: 2.2.0)
2.1.0
[
https://issues.apache.org/jira/browse/SPARK-18592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-18592:
---
Fix Version/s: (was: 2.2.0)
> Move DT/RF/GBT Param setter methods to subclas
[
https://issues.apache.org/jira/browse/SPARK-18320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731411#comment-15731411
]
Nick Pentreath commented on SPARK-18320:
Went ahead and re-marked fix version to {{2.1.0}} since
[
https://issues.apache.org/jira/browse/SPARK-18324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-18324:
---
Fix Version/s: (was: 2.2.0)
> ML, Graph 2.1 QA: Programming guide update and migrat
[
https://issues.apache.org/jira/browse/SPARK-18408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-18408:
---
Fix Version/s: (was: 2.2.0)
> API Improvements for
[
https://issues.apache.org/jira/browse/SPARK-18320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-18320:
---
Fix Version/s: (was: 2.1.1)
(was: 2.2.0)
2.1.0
[
https://issues.apache.org/jira/browse/SPARK-18408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731407#comment-15731407
]
Nick Pentreath commented on SPARK-18408:
Went ahead and re-marked fix version to {{2.1.0}} since
[
https://issues.apache.org/jira/browse/SPARK-18324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-18324:
---
Fix Version/s: (was: 2.1.1)
2.1.0
> ML, Graph 2.1 QA: Programm
[
https://issues.apache.org/jira/browse/SPARK-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731408#comment-15731408
]
Nick Pentreath commented on SPARK-18366:
Went ahead and re-marked fix version to {{2.1.0}} since
[
https://issues.apache.org/jira/browse/SPARK-18324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731410#comment-15731410
]
Nick Pentreath commented on SPARK-18324:
Went ahead and re-marked fix version to {{2.1.0}} since
[
https://issues.apache.org/jira/browse/SPARK-18612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731402#comment-15731402
]
Nick Pentreath commented on SPARK-18612:
Went ahead and re-marked fix version to {{2.1.0}} since
[
https://issues.apache.org/jira/browse/SPARK-18592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-18592:
---
Fix Version/s: (was: 2.1.1)
2.1.0
> Move DT/RF/GBT Param set
[
https://issues.apache.org/jira/browse/SPARK-18592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731406#comment-15731406
]
Nick Pentreath commented on SPARK-18592:
Went ahead and re-marked fix version to {{2.1.0}} since
[
https://issues.apache.org/jira/browse/SPARK-18612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-18612:
---
Fix Version/s: (was: 2.1.1)
2.1.0
> Leaked broadcasted variable Ml
[
https://issues.apache.org/jira/browse/SPARK-18408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-18408:
---
Fix Version/s: (was: 2.1.1)
2.1.0
> API Improvements for
[
https://issues.apache.org/jira/browse/SPARK-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-18366:
---
Fix Version/s: (was: 2.1.1)
2.1.0
> Add handleInvalid to Pysp
Indeed, it's being tracked here:
https://issues.apache.org/jira/browse/SPARK-18230 though no Pr has been
opened yet.
On Tue, 6 Dec 2016 at 13:36 chris snow wrote:
> I'm using the MatrixFactorizationModel.predict() method and encountered
> the following exception:
>
> Name:
[
https://issues.apache.org/jira/browse/SPARK-18704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15721448#comment-15721448
]
Nick Pentreath commented on SPARK-18704:
Yeah, I like this idea. I've also been finding
[
https://issues.apache.org/jira/browse/SPARK-12347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15711095#comment-15711095
]
Nick Pentreath commented on SPARK-12347:
Since the PR is still WIP and this is not a blocker
[
https://issues.apache.org/jira/browse/SPARK-12347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-12347:
---
Target Version/s: 2.2.0 (was: 2.1.0)
> Write script to run all MLlib examples for test
[
https://issues.apache.org/jira/browse/SPARK-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-18366:
---
Assignee: Sandeep Singh
> Add handleInvalid to Pyspark for QuantileDiscreti
[
https://issues.apache.org/jira/browse/SPARK-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-18366:
---
Fix Version/s: (was: 2.1.0)
2.1.1
> Add handleInvalid to Pysp
[
https://issues.apache.org/jira/browse/SPARK-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath resolved SPARK-18366.
Resolution: Fixed
Fix Version/s: 2.1.0
Issue resolved by pull request 15817
[https
check out https://github.com/VinceShieh/Spark-AdaOptimizer
On Wed, 30 Nov 2016 at 10:52 WangJianfei
wrote:
> Hi devs:
> Normally, the adaptive learning rate methods can have a fast
> convergence
> then standard SGD, so why don't we imp them?
> see the link
[
https://issues.apache.org/jira/browse/SPARK-18616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15704472#comment-15704472
]
Nick Pentreath commented on SPARK-18616:
Just a note that generally committers set Target Version
[
https://issues.apache.org/jira/browse/SPARK-18616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-18616:
---
Target Version/s: (was: 2.0.2)
> Pure Python Implementation of MLWritable for
[
https://issues.apache.org/jira/browse/SPARK-18608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15701839#comment-15701839
]
Nick Pentreath commented on SPARK-18608:
I've also been meaning to log this for a little while
Nick Pentreath created SPARK-18608:
--
Summary: Spark ML algorithms that check RDD cache level for
internal caching double-cache data
Key: SPARK-18608
URL: https://issues.apache.org/jira/browse/SPARK-18608
This is because currently GBTClassifier doesn't extend the
ClassificationModel abstract class, which in turn has the rawPredictionCol
and related methods for generating that column.
I'm actually not sure off hand whether this was because the GBT
implementation could not produce the raw prediction
[
https://issues.apache.org/jira/browse/SPARK-18450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-18450:
---
Component/s: ML
> Add AND-amplification to Locality Sensitive Hash
[
https://issues.apache.org/jira/browse/SPARK-18454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-18454:
---
Component/s: ML
> Changes to fix Nearest Neighbor Search for
[
https://issues.apache.org/jira/browse/SPARK-18408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-18408:
---
Component/s: ML
> API Improvements for
[
https://issues.apache.org/jira/browse/SPARK-18456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath resolved SPARK-18456.
Resolution: Fixed
Assignee: Seth Hendrickson
Fix Version/s: 2.1.0
>
[
https://issues.apache.org/jira/browse/SPARK-18023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15682471#comment-15682471
]
Nick Pentreath commented on SPARK-18023:
Linking SPARK-17136 which is really a blocker for adding
[
https://issues.apache.org/jira/browse/SPARK-16377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15682458#comment-15682458
]
Nick Pentreath commented on SPARK-16377:
Is this still a bug? As per your above comment seems we
[
https://issues.apache.org/jira/browse/SPARK-6346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15682455#comment-15682455
]
Nick Pentreath commented on SPARK-6346:
---
I think we can close this ticket? It's pretty old
@Holden look forward to the blog post - I think a user guide PR based on it
would also be super useful :)
On Fri, 18 Nov 2016 at 05:29 Holden Karau wrote:
> I've been working on a blog post around this and hope to have it published
> early next month
>
> On Nov 17,
[
https://issues.apache.org/jira/browse/SPARK-18441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670210#comment-15670210
]
Nick Pentreath commented on SPARK-18441:
Yes, it would be good to understand what this is all
alyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
On Mon, Nov 14, 2016 at 1:44 PM, Nick Pentreath <nick.pentre...@gmail.com>
wrote:
DataFrame.rdd returns an RDD[Row]. You'll need to use map to extract the
doubles from the test score and label DF.
But you may prefer to just
DataFrame.rdd returns an RDD[Row]. You'll need to use map to extract the
doubles from the test score and label DF.
But you may prefer to just use spark.ml evaluators, which work with
DataFrames. Try BinaryClassificationEvaluator.
On Mon, 14 Nov 2016 at 19:30, Bhaarat Sharma
LSH-based NN search and similarity join should be out in Spark 2.1 -
there's a little work being done still to clear up the APIs and some
functionality.
Check out https://issues.apache.org/jira/browse/SPARK-5992
On Mon, 14 Nov 2016 at 16:12, Kevin Mellott
wrote:
>
For now OHE supports a single column. So you have to have 1000 OHE in a
pipeline. However you can add them programatically so it is not too bad. If
the cardinality of each feature is quite low, it should be workable.
After that user VectorAssembler to stitch the vectors together (which
accepts
[
https://issues.apache.org/jira/browse/SPARK-18341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-18341:
---
Comment: was deleted
(was: Just for interest - why is an error code more desirable
[
https://issues.apache.org/jira/browse/SPARK-18341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15657117#comment-15657117
]
Nick Pentreath commented on SPARK-18341:
Just for interest - why is an error code more desirable
[
https://issues.apache.org/jira/browse/SPARK-18235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15636190#comment-15636190
]
Nick Pentreath commented on SPARK-18235:
This duplicates SPARK-13857. Please feel free to comment
[
https://issues.apache.org/jira/browse/SPARK-17772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-17772:
---
Assignee: Seth Hendrickson
Target Version/s: 2.1.0
> Add helper testing meth
[
https://issues.apache.org/jira/browse/SPARK-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath resolved SPARK-17138.
Resolution: Fixed
Assignee: Weichen Xu
Fix Version/s: 2.1.0
> Python
[
https://issues.apache.org/jira/browse/SPARK-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-18060:
---
Assignee: Seth Hendrickson
Target Version/s: 2.1.0
> Avoid unnecess
I have a PR for it - https://github.com/apache/spark/pull/12574
Sadly I've been tied up and haven't had a chance to work further on it.
The main issue outstanding is deciding on the transform semantics as well
as performance testing.
Any comments / feedback welcome especially on transform
Oh also you mention 20 partitions. Is that how many you have? How many
ratings?
It may be worth trying to reparation to larger number of partitions.
On Fri, 21 Oct 2016 at 17:04, Nick Pentreath <nick.pentre...@gmail.com>
wrote:
> I wonder if you can try with setting different blocks
t was going out of memory with the default size too.
>
> On Fri, Oct 21, 2016 at 5:31 AM, Nick Pentreath <nick.pentre...@gmail.com>
> wrote:
>
> Did you try not setting the blocks parameter? It will then try to set it
> automatically for your data size.
> On Fri, 21 Oct 2016
lock size to 20,000 also results in the same. So there is
> something I don't understand about how this is working.
>
> BTW, I am trying to find 50 latent factors (rank = 50).
>
> Do you have some insights as to how I should tweak things to get this
> working?
>
> Thanks,
> Nik
>
Currently no - GBT implements the predictors, not the classifier interface.
It might be possible to wrap it in a wrapper that extends the Classifier
trait.
Hopefully GBT will support multi-class at some point. But you can use
RandomForest which does support multi-class.
On Fri, 21 Oct 2016 at
The blocks params will set both user and item blocks.
Spark 2.0 supports user and item blocks for PySpark:
http://spark.apache.org/docs/latest/api/python/pyspark.ml.html#module-pyspark.ml.recommendation
On Fri, 21 Oct 2016 at 08:12 Nikhil Mishra
wrote:
> Hi,
>
> I
You can use the PolynomialExpansion in Spark ML (
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.feature.PolynomialExpansion
)
On Tue, 18 Oct 2016 at 21:47 miro wrote:
> Yes, I was thinking going down this road:
>
>
>
;
>
>
> Sincerely,
>
>
>
> DB Tsai
>
> --
>
> Web: https://www.dbtsai.com
>
> PGP Key ID: 0xAF08DF8D
>
>
>
>
>
> On Thu, Oct 6, 2016 at 4:09 AM, Nick Pentreath <nick.pentre...@gmail.com>
[
https://issues.apache.org/jira/browse/SPARK-17784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15564946#comment-15564946
]
Nick Pentreath edited comment on SPARK-17784 at 10/11/16 8:59 AM:
--
It's
[
https://issues.apache.org/jira/browse/SPARK-17784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15564946#comment-15564946
]
Nick Pentreath commented on SPARK-17784:
It's actually to create a new `KMeans` estimator I
[
https://issues.apache.org/jira/browse/SPARK-14501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-14501:
---
Target Version/s: 2.1.0
> spark.ml parity for fpm - frequent it
There is a JIRA and PR for it -
https://issues.apache.org/jira/browse/SPARK-14709
On Tue, 27 Sep 2016 at 09:10 hxw黄祥为 wrote:
> I have found spark ml package have implement naivebayes algorithm and the
> source code is simple,.
>
> I am confusing why spark ml package doesn’t
[
https://issues.apache.org/jira/browse/SPARK-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath closed SPARK-17407.
--
Resolution: Not A Problem
> Unable to update structured stream from
The scale factor was only to scale up the number of ratings in the dataset
for performance testing purposes, to illustrate the scalability of Spark
ALS.
It is not something you would normally do on your training dataset.
On Fri, 23 Sep 2016 at 20:07, Roshani Nagmote
Sorry, the original repo: https://github.com/karlhigley/spark-neighbors
On Wed, 21 Sep 2016 at 13:09 Nick Pentreath <nick.pentre...@gmail.com>
wrote:
> I should also point out another library I had not come across before :
> https://github.com/sethah/spark-neighbors
>
>
>
in a mere 65 seconds! Thanks so much for the help!
>
> On Tue, Sep 20, 2016 at 1:15 PM, Kevin Mellott <kevin.r.mell...@gmail.com>
> wrote:
>
>> Thanks Nick - those examples will help a ton!!
>>
>> On Tue, Sep 20, 2016 at 12:20 PM, Nick Pentreath <
>> nick
documents 1 and 2 need to be compared to one
> another (via cosine similarity) because they both contain the token
> 'hockey'. I will investigate the methods that you recommended to see if
> they may resolve our problem.
>
> Thanks,
> Kevin
>
> On Tue, Sep 20, 2016 at 1:45 AM,
(cc'ing dev list also)
I think a more general version of ranking metrics that allows arbitrary
relevance scores could be useful. Ranking metrics are applicable to other
settings like search or other learning-to-rank use cases, so it should be a
little more generic than pure recommender settings.
(cc'ing dev list also)
I think a more general version of ranking metrics that allows arbitrary
relevance scores could be useful. Ranking metrics are applicable to other
settings like search or other learning-to-rank use cases, so it should be a
little more generic than pure recommender settings.
How many products do you have? How large are your vectors?
It could be that SVD / LSA could be helpful. But if you have many products
then trying to compute all-pair similarity with brute force is not going to
be scalable. In this case you may want to investigate hashing (LSH)
techniques.
On
Try als.setCheckpointInterval (
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.recommendation.ALS@setCheckpointInterval(checkpointInterval:Int):ALS.this.type
)
On Mon, 19 Sep 2016 at 20:01 Roshani Nagmote
wrote:
> Hello Sean,
>
> Can
The PR already exists for adding RankingEvaluator to ML -
https://github.com/apache/spark/pull/12461. I need to revive and review it.
DB, your review would be welcome too (and also on
https://github.com/apache/spark/issues/12574 which has implications for the
semantics of ranking metrics in the
Could you create a JIRA ticket for it?
https://issues.apache.org/jira/browse/SPARK
On Thu, 8 Sep 2016 at 07:50 evanzamir wrote:
> When I am trying to use LinearRegression, it seems that unless there is a
> column specified with weights, it will raise a py4j error. Seems
lia...@gmail.com> wrote:
>>
>>> This sounds good to me, and it will make ML examples more neatly.
>>>
>>> 2016-04-14 5:28 GMT-07:00 Nick Pentreath <nick.pentre...@gmail.com>:
>>>
>>>> Hey Spark devs
>>>>
>>>>
[
https://issues.apache.org/jira/browse/SPARK-17479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478058#comment-15478058
]
Nick Pentreath commented on SPARK-17479:
I just ran Scala, Java and Python examples of {{ml
[
https://issues.apache.org/jira/browse/SPARK-17479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478050#comment-15478050
]
Nick Pentreath commented on SPARK-17479:
I do see the data file:
https://github.com/apache/spark
[
https://issues.apache.org/jira/browse/SYSTEMML-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15477670#comment-15477670
]
Nick Pentreath commented on SYSTEMML-903:
-
cc [~deron] [~niketanpansare]
> [Python API] Spa
[
https://issues.apache.org/jira/browse/SYSTEMML-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SYSTEMML-903:
Description:
Hitting this exception when doing something (admittedly trivial) in Python
[
https://issues.apache.org/jira/browse/SYSTEMML-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SYSTEMML-903:
Summary: [Python API] Sparse to dense conversion is not yet implemented
(was: [Python
Nick Pentreath created SYSTEMML-903:
---
Summary: [Python APISparse to dense conversion is not yet
implemented
Key: SYSTEMML-903
URL: https://issues.apache.org/jira/browse/SYSTEMML-903
Project
You can use a udf like this:
Welcome to
__
/ __/__ ___ _/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.0.0
/_/
Using Python version 2.7.12 (default, Jul 2 2016 17:43:17)
SparkSession available as 'spark'.
In [1]: from
[
https://issues.apache.org/jira/browse/SPARK-17094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471563#comment-15471563
]
Nick Pentreath commented on SPARK-17094:
It's true that constructor doesn't exist. It could
That does seem strange. Can you provide an example to reproduce?
On Tue, 6 Sep 2016 at 21:49 evanzamir wrote:
> Am I misinterpreting what r2() in the LinearRegression Model summary means?
> By definition, R^2 should never be a negative number!
>
>
>
> --
> View this
[
https://issues.apache.org/jira/browse/SPARK-17400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466957#comment-15466957
]
Nick Pentreath commented on SPARK-17400:
Could you explain further why you want to min-max scale
[
https://issues.apache.org/jira/browse/SPARK-17400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464360#comment-15464360
]
Nick Pentreath edited comment on SPARK-17400 at 9/5/16 7:42 AM:
Can you
[
https://issues.apache.org/jira/browse/SPARK-17400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464360#comment-15464360
]
Nick Pentreath commented on SPARK-17400:
Can you comment more on the performance issue - are you
at 15:37 Nick Pentreath <nick.pentre...@gmail.com> wrote:
> Right now you are correct that Spark ML APIs do not support predicting on
> a single instance (whether Vector for the models or a Row for a pipeline).
>
> See https://issues.apache.org/jira/browse/SPARK-10413 and
> http
Right now you are correct that Spark ML APIs do not support predicting on a
single instance (whether Vector for the models or a Row for a pipeline).
See https://issues.apache.org/jira/browse/SPARK-10413 and
https://issues.apache.org/jira/browse/SPARK-16431 (duplicate) for some
discussion.
There
Try this:
val df = spark.createDataFrame(Seq(Vectors.dense(Array(10, 590, 190,
700))).map(Tuple1.apply)).toDF("features")
On Sun, 28 Aug 2016 at 11:06 yaroslav wrote:
> Hi,
>
> We use such kind of logic for training our model
>
> val model = new
y and
>> it will work. I was wondering if I could do this in Spark/Scala with my
>> limited knowledge
>>
>> Cheers
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn *
>> https://www.linkedin.com/profile/view?id=AAEWh
what is "text"? i.e. what is the "val text = ..." definition?
If text is a String itself then indeed sc.parallelize(Array(text)) is doing
the correct thing in this case.
On Tue, 23 Aug 2016 at 19:42 Mich Talebzadeh
wrote:
> I am sure someone know this :)
>
> Created
It's not impossible that a Transformer could output multiple columns - it's
simply because none of the current ones do. It's true that it might be a
relatively less common use case in general.
But take StringIndexer for example. It turns strings (categorical features)
into ints (0-based indexes).
[
https://issues.apache.org/jira/browse/SPARK-13030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431319#comment-15431319
]
Nick Pentreath edited comment on SPARK-13030 at 8/22/16 6:08 PM:
-
Yes I
[
https://issues.apache.org/jira/browse/SPARK-13030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431319#comment-15431319
]
Nick Pentreath commented on SPARK-13030:
Yes I also agree OHE needs to be an {{Estimator
I believe it may be because of this issue (
https://issues.apache.org/jira/browse/SPARK-13030). OHE is not an estimator
- hence in cases where the number of categories differ between train and
test, it's not usable in the current form.
It's tricky to work around, though one option is to use
[
https://issues.apache.org/jira/browse/SPARK-15113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath resolved SPARK-15113.
Resolution: Fixed
Fix Version/s: 2.1.0
Issue resolved by pull request 12889
[https
[
https://issues.apache.org/jira/browse/SPARK-15113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Pentreath updated SPARK-15113:
---
Assignee: holdenk
> Add missing numFeatures & numClasses to wrapped JavaClassificati
501 - 600 of 1370 matches
Mail list logo