[ 
https://issues.apache.org/jira/browse/SPARK-11992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15027304#comment-15027304
 ] 

Alberto Bonsanto edited comment on SPARK-11992 at 11/25/15 7:25 PM:
--------------------------------------------------------------------

[~srowen] Hello, I appreciate the time you spent by commenting my issue, this 
is my first time trying to ask and expose something in Jira, and I am seriously 
lost, is there any guide or something I can read, so I can formulate my 
question more properly, and avoid disturbing the busy Spark Community? 


was (Author: bonsanto):
[~srowen] Hello, I appreciate your time commenting my issue, this is my first 
time trying to ask and expose something in Jira, and I am seriously lost, is 
there any guide or something I can read, so I can formulate my question more 
properly, and avoid disturbing the busy Spark Community? 

> Severl numbers in my spark shell (pyspark)
> ------------------------------------------
>
>                 Key: SPARK-11992
>                 URL: https://issues.apache.org/jira/browse/SPARK-11992
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib, PySpark
>    Affects Versions: 1.5.2
>         Environment: Linux Ubuntu 14.04 LTS
> Jupyter 
> Spark 1.5.2
>            Reporter: Alberto Bonsanto
>            Priority: Critical
>              Labels: newbie
>
> The problem is very weird, I am currently trying to fit some classifiers from 
> mllib library (SVM, LogisticRegression, RandomForest, DecisionTree and 
> NaiveBayes), so they might classify the data properly, I am trying to compare 
> their performances evaluating their predictions using my current validation 
> data (the typical pipeline), and the problem is that when I try to fit any of 
> those, my spark-shell console prints millions and millions of entries, and 
> after that the fitting process gets stopped, you can see it 
> [here|http://i.imgur.com/mohLnwr.png]
> Some details:
> - My data has around 15M of entries.
> - I use LabeledPoints to represent each entry, where the features are 
> SparseVectors and they have *104* features or dimensions.
> - I don't show many things in the console, 
> [log4j.properties|https://gist.github.com/Bonsanto/c487624db805f56882b8]
> - The program is running locally in a computer with 16GB of RAM. 
> I have already asked this, in StackOverflow, you can see it here [Crazy 
> print|http://stackoverflow.com/questions/33807347/pyspark-shell-outputs-several-numbers-instead-of-the-loading-arrow]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to