number of partitions for hive schemaRDD
Hi all now, I'm trying the SparkSQL with hivecontext. when I execute the hql like the following. --- val ctx = new org.apache.spark.sql.hive.HiveContext(sc) import ctx._ val queries = ctx.hql(select keyword from queries where dt = '2015-02-01' limit 1000) --- It seem that the number of the partitions ot the queries is set by 1. Is this the specifications for schemaRDD, SparkSQL, HiveContext ? Are there any means to set the number of partitions arbitrary value except for explicit repartition Masaki Rikitoku - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: IDF for ml pipeline
Thank you for your reply. I will do it. — Mailbox から送信 On Tue, Feb 3, 2015 at 6:12 PM, Xiangrui Meng men...@gmail.com wrote: Yes, we need a wrapper under spark.ml. Feel free to create a JIRA for it. -Xiangrui On Mon, Feb 2, 2015 at 8:56 PM, masaki rikitoku rikima3...@gmail.com wrote: Hi all I am trying the ml pipeline for text classfication now. recently, i succeed to execute the pipeline processing in ml packages, which consist of the original Japanese tokenizer, hashingTF, logisticRegression. then, i failed to executed the pipeline with idf in mllib package directly. To use the idf feature in ml package, do i have to implement the wrapper for idf in ml package like the hashingTF? best Masaki Rikitoku - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
IDF for ml pipeline
Hi all I am trying the ml pipeline for text classfication now. recently, i succeed to execute the pipeline processing in ml packages, which consist of the original Japanese tokenizer, hashingTF, logisticRegression. then, i failed to executed the pipeline with idf in mllib package directly. To use the idf feature in ml package, do i have to implement the wrapper for idf in ml package like the hashingTF? best Masaki Rikitoku - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org