Re: pyspark vector
Well the 3 in this case is the size of the sparse vector. This equates to the number of features, which for CountVectorizer (I assume that's what you're using) is also vocab size (number of unique terms). On Tue, 25 Apr 2017 at 04:06 Peyman Mohajerianwrote: > setVocabSize > > > On Mon, Apr 24, 2017 at 5:36 PM, Zeming Yu wrote: > >> Hi all, >> >> Beginner question: >> >> what does the 3 mean in the (3,[0,1,2],[1.0,1.0,1.0])? >> >> https://spark.apache.org/docs/2.1.0/ml-features.html >> >> id | texts | vector >> |-|--- >> 0 | Array("a", "b", "c")| (3,[0,1,2],[1.0,1.0,1.0]) >> 1 | Array("a", "b", "b", "c", "a") | (3,[0,1,2],[2.0,2.0,1.0]) >> >> >
Re: pyspark vector
setVocabSize On Mon, Apr 24, 2017 at 5:36 PM, Zeming Yuwrote: > Hi all, > > Beginner question: > > what does the 3 mean in the (3,[0,1,2],[1.0,1.0,1.0])? > > https://spark.apache.org/docs/2.1.0/ml-features.html > > id | texts | vector > |-|--- > 0 | Array("a", "b", "c")| (3,[0,1,2],[1.0,1.0,1.0]) > 1 | Array("a", "b", "b", "c", "a") | (3,[0,1,2],[2.0,2.0,1.0]) > >
pyspark vector
Hi all, Beginner question: what does the 3 mean in the (3,[0,1,2],[1.0,1.0,1.0])? https://spark.apache.org/docs/2.1.0/ml-features.html id | texts | vector |-|--- 0 | Array("a", "b", "c")| (3,[0,1,2],[1.0,1.0,1.0]) 1 | Array("a", "b", "b", "c", "a") | (3,[0,1,2],[2.0,2.0,1.0])