[ https://issues.apache.org/jira/browse/SPARK-11992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15027304#comment-15027304 ]
Alberto Bonsanto commented on SPARK-11992: ------------------------------------------ [~srowen] Hello, I appreciate your time commenting my issue, this is my first time trying to ask and expose something in Jira, and I am seriously lost, is there any guide or something I can read, so I can formulate my question more properly, and avoid disturbing the busy Spark Community? > Severl numbers in my spark shell (pyspark) > ------------------------------------------ > > Key: SPARK-11992 > URL: https://issues.apache.org/jira/browse/SPARK-11992 > Project: Spark > Issue Type: Question > Components: MLlib, PySpark > Affects Versions: 1.5.2 > Environment: Linux Ubuntu 14.04 LTS > Jupyter > Spark 1.5.2 > Reporter: Alberto Bonsanto > Priority: Blocker > Labels: newbie > > The problem is very weird, I am currently trying to fit some classifiers from > mllib library (SVM, LogisticRegression, RandomForest, DecisionTree and > NaiveBayes), so they might classify the data properly, I am trying to compare > their performances evaluating their predictions using my current validation > data (the typical pipeline), and the problem is that when I try to fit any of > those, my spark-shell console prints millions and millions of entries, and > after that the fitting process gets stopped, you can see it > [here|http://i.imgur.com/mohLnwr.png] > Some details: > - My data has around 15M of entries. > - I use LabeledPoints to represent each entry, where the features are > SparseVectors and they have *104* features or dimensions. > - I don't show many things in the console, > [log4j.properties|https://gist.github.com/Bonsanto/c487624db805f56882b8] > - The program is running locally in a computer with 16GB of RAM. > I have already asked this, in StackOverflow, you can see it here [Crazy > print|http://stackoverflow.com/questions/33807347/pyspark-shell-outputs-several-numbers-instead-of-the-loading-arrow] -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org