[mllib] Is there any bugs to divide a Breeze sparse vectors at Spark v1.3.0-rc3?
Hi all, Is there any bugs to divide a Breeze sparse vector at Spark v1.3.0-rc3? When I tried to divide a sparse vector at Spark v1.3.0-rc3, I got a wrong result if the target vector has any zero values. Spark v1.3.0-rc3 depends on Breeze v0.11.1. And Breeze v0.11.1 seems to have any bugs to divide a sparse vector by a scalar value. When dividing a breeze sparse vector which has any zero values, the result seems to be a zero vector. However, we can run the same code on Spark v1.2.x. However, there is no problem to multiply a breeze sparse vector. I asked the breeze community this problem on the below issue. https://github.com/scalanlp/breeze/issues/382 For example, ``` test(dividing a breeze spark vector) { val vec = Vectors.sparse(6, Array(0, 4), Array(0.0, 10.0)).toBreeze val n = 60.0 val answer1 = vec :/ n val answer2 = vec.toDenseVector :/ n println(vec) println(answer1) println(answer2) assert(answer1.toDenseVector === answer2) } SparseVector((0,0.0), (4,10.0)) SparseVector() DenseVector(0.0, 0.0, 0.0, 0.0, 0.1, 0.0) DenseVector(0.0, 0.0, 0.0, 0.0, 0.0, 0.0) did not equal DenseVector(0.0, 0.0, 0.0, 0.0, 0.1, 0.0) org.scalatest.exceptions.TestFailedException: DenseVector(0.0, 0.0, 0.0, 0.0, 0.0, 0.0) did not equal DenseVector(0.0, 0.0, 0.0, 0.0, 0.1, 0.0) ``` Thanks, Yu Ishikawa - -- Yu Ishikawa -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Is-there-any-bugs-to-divide-a-Breeze-sparse-vectors-at-Spark-v1-3-0-rc3-tp11056.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [mllib] Is there any bugs to divide a Breeze sparse vectors at Spark v1.3.0-rc3?
David Hall who is a breeze creator told me that it's a bug. So, I made a jira ticket about this issue. We need to upgrade breeze from 0.11.1 to 0.11.2 or later in order to fix the bug, when the new version of breeze will be released. [SPARK-6341] Upgrade breeze from 0.11.1 to 0.11.2 or later - ASF JIRA https://issues.apache.org/jira/browse/SPARK-6341 Thanks, Yu Ishikawa - -- Yu Ishikawa -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Is-there-any-bugs-to-divide-a-Breeze-sparse-vectors-at-Spark-v1-3-0-rc3-tp11056p11058.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [mllib] Is there any bugs to divide a Breeze sparse vectors at Spark v1.3.0-rc3?
It's a bug in breeze's side. Once David fixes it and publishes it to maven, we can upgrade to breeze 0.11.2. Please file a jira ticket for this issue. thanks. Sincerely, DB Tsai --- Blog: https://www.dbtsai.com On Sun, Mar 15, 2015 at 12:45 AM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com wrote: Hi all, Is there any bugs to divide a Breeze sparse vector at Spark v1.3.0-rc3? When I tried to divide a sparse vector at Spark v1.3.0-rc3, I got a wrong result if the target vector has any zero values. Spark v1.3.0-rc3 depends on Breeze v0.11.1. And Breeze v0.11.1 seems to have any bugs to divide a sparse vector by a scalar value. When dividing a breeze sparse vector which has any zero values, the result seems to be a zero vector. However, we can run the same code on Spark v1.2.x. However, there is no problem to multiply a breeze sparse vector. I asked the breeze community this problem on the below issue. https://github.com/scalanlp/breeze/issues/382 For example, ``` test(dividing a breeze spark vector) { val vec = Vectors.sparse(6, Array(0, 4), Array(0.0, 10.0)).toBreeze val n = 60.0 val answer1 = vec :/ n val answer2 = vec.toDenseVector :/ n println(vec) println(answer1) println(answer2) assert(answer1.toDenseVector === answer2) } SparseVector((0,0.0), (4,10.0)) SparseVector() DenseVector(0.0, 0.0, 0.0, 0.0, 0.1, 0.0) DenseVector(0.0, 0.0, 0.0, 0.0, 0.0, 0.0) did not equal DenseVector(0.0, 0.0, 0.0, 0.0, 0.1, 0.0) org.scalatest.exceptions.TestFailedException: DenseVector(0.0, 0.0, 0.0, 0.0, 0.0, 0.0) did not equal DenseVector(0.0, 0.0, 0.0, 0.0, 0.1, 0.0) ``` Thanks, Yu Ishikawa - -- Yu Ishikawa -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Is-there-any-bugs-to-divide-a-Breeze-sparse-vectors-at-Spark-v1-3-0-rc3-tp11056.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Wrong version on the Spark documentation page
When I enter http://spark.apache.org/docs/latest/ into Chrome address bar, I saw 1.3.0 Cheers On Sun, Mar 15, 2015 at 11:12 AM, Patrick Wendell pwend...@gmail.com wrote: Cheng - what if you hold shift+refresh? For me the /latest link correctly points to 1.3.0 On Sun, Mar 15, 2015 at 10:40 AM, Cheng Lian lian.cs@gmail.com wrote: It's still marked as 1.2.1 here http://spark.apache.org/docs/latest/ But this page is updated (1.3.0) http://spark.apache.org/docs/latest/index.html Cheng - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Wrong version on the Spark documentation page
It's still marked as 1.2.1 here http://spark.apache.org/docs/latest/ But this page is updated (1.3.0) http://spark.apache.org/docs/latest/index.html Cheng - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Wrong version on the Spark documentation page
Cheng - what if you hold shift+refresh? For me the /latest link correctly points to 1.3.0 On Sun, Mar 15, 2015 at 10:40 AM, Cheng Lian lian.cs@gmail.com wrote: It's still marked as 1.2.1 here http://spark.apache.org/docs/latest/ But this page is updated (1.3.0) http://spark.apache.org/docs/latest/index.html Cheng - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: broadcast hang out
Cross region as in different data centers ? - Mridul On Sun, Mar 15, 2015 at 8:08 PM, lonely Feb lonely8...@gmail.com wrote: Hi all, i meet up with a problem that torrent broadcast hang out in my spark cluster (1.2, standalone) , particularly serious when driver and executors are cross-region. when i read the code of broadcast i found that a sync block read here: def fetchBlockSync(host: String, port: Int, execId: String, blockId: String): ManagedBuffer = { // A monitor for the thread to wait on. val result = Promise[ManagedBuffer]() fetchBlocks(host, port, execId, Array(blockId), new BlockFetchingListener { override def onBlockFetchFailure(blockId: String, exception: Throwable): Unit = { result.failure(exception) } override def onBlockFetchSuccess(blockId: String, data: ManagedBuffer): Unit = { val ret = ByteBuffer.allocate(data.size.toInt) ret.put(data.nioByteBuffer()) ret.flip() result.success(new NioManagedBuffer(ret)) } }) Await.result(result.future, Duration.Inf) } it seems that fetchBlockSync method does not have a timeout limit but wait forever ? Anybody can show me how to control the timeout here? - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: broadcast hang out
yes 2015-03-16 11:43 GMT+08:00 Mridul Muralidharan mri...@gmail.com: Cross region as in different data centers ? - Mridul On Sun, Mar 15, 2015 at 8:08 PM, lonely Feb lonely8...@gmail.com wrote: Hi all, i meet up with a problem that torrent broadcast hang out in my spark cluster (1.2, standalone) , particularly serious when driver and executors are cross-region. when i read the code of broadcast i found that a sync block read here: def fetchBlockSync(host: String, port: Int, execId: String, blockId: String): ManagedBuffer = { // A monitor for the thread to wait on. val result = Promise[ManagedBuffer]() fetchBlocks(host, port, execId, Array(blockId), new BlockFetchingListener { override def onBlockFetchFailure(blockId: String, exception: Throwable): Unit = { result.failure(exception) } override def onBlockFetchSuccess(blockId: String, data: ManagedBuffer): Unit = { val ret = ByteBuffer.allocate(data.size.toInt) ret.put(data.nioByteBuffer()) ret.flip() result.success(new NioManagedBuffer(ret)) } }) Await.result(result.future, Duration.Inf) } it seems that fetchBlockSync method does not have a timeout limit but wait forever ? Anybody can show me how to control the timeout here?
Re: broadcast hang out
Thx. But this method is in BlockTransferService.scala of spark which i can not replace unless i rewrite the core code. I wonder if it is handled somewhere already. 2015-03-16 11:27 GMT+08:00 Chester Chen ches...@alpinenow.com: can you just replace Duration.Inf with a shorter duration ? how about import scala.concurrent.duration._ val timeout = new Timeout(10 seconds) Await.result(result.future, timeout.duration) or val timeout = new FiniteDuration(10, TimeUnit.SECONDS) Await.result(result.future, timeout) or simply import scala.concurrent.duration._ Await.result(result.future, 10 seconds) On Sun, Mar 15, 2015 at 8:08 PM, lonely Feb lonely8...@gmail.com wrote: Hi all, i meet up with a problem that torrent broadcast hang out in my spark cluster (1.2, standalone) , particularly serious when driver and executors are cross-region. when i read the code of broadcast i found that a sync block read here: def fetchBlockSync(host: String, port: Int, execId: String, blockId: String): ManagedBuffer = { // A monitor for the thread to wait on. val result = Promise[ManagedBuffer]() fetchBlocks(host, port, execId, Array(blockId), new BlockFetchingListener { override def onBlockFetchFailure(blockId: String, exception: Throwable): Unit = { result.failure(exception) } override def onBlockFetchSuccess(blockId: String, data: ManagedBuffer): Unit = { val ret = ByteBuffer.allocate(data.size.toInt) ret.put(data.nioByteBuffer()) ret.flip() result.success(new NioManagedBuffer(ret)) } }) Await.result(result.future, Duration.Inf) } it seems that fetchBlockSync method does not have a timeout limit but wait forever ? Anybody can show me how to control the timeout here?
Re: [mllib] Is there any bugs to divide a Breeze sparse vectors at Spark v1.3.0-rc3?
snapshot is pushed. If you verify I'll publish the new artifacts. On Sun, Mar 15, 2015 at 1:14 AM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com wrote: David Hall who is a breeze creator told me that it's a bug. So, I made a jira ticket about this issue. We need to upgrade breeze from 0.11.1 to 0.11.2 or later in order to fix the bug, when the new version of breeze will be released. [SPARK-6341] Upgrade breeze from 0.11.1 to 0.11.2 or later - ASF JIRA https://issues.apache.org/jira/browse/SPARK-6341 Thanks, Yu Ishikawa - -- Yu Ishikawa -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/mllib-Is-there-any-bugs-to-divide-a-Breeze-sparse-vectors-at-Spark-v1-3-0-rc3-tp11056p11058.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: SparkSQL 1.3.0 (RC3) failed to read parquet file generated by 1.1.1
Thanks! On Sat, Mar 14, 2015 at 3:31 AM, Michael Armbrust mich...@databricks.com wrote: Here is the JIRA: https://issues.apache.org/jira/browse/SPARK-6315 On Thu, Mar 12, 2015 at 11:00 PM, Michael Armbrust mich...@databricks.com wrote: We are looking at the issue and will likely fix it for Spark 1.3.1. On Thu, Mar 12, 2015 at 8:25 PM, giive chen thegi...@gmail.com wrote: Hi all My team has the same issue. It looks like Spark 1.3's sparkSQL cannot read parquet file generated by Spark 1.1. It will cost a lot of migration work when we wanna to upgrade Spark 1.3. Is there anyone can help me? Thanks Wisely Chen On Tue, Mar 10, 2015 at 5:06 PM, Pei-Lun Lee pl...@appier.com wrote: Hi, I found that if I try to read parquet file generated by spark 1.1.1 using 1.3.0-rc3 by default settings, I got this error: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'StructType': was expecting ('true', 'false' or 'null') at [Source: StructType(List(StructField(a,IntegerType,false))); line: 1, column: 11] at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1419) at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:508) at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2300) at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1459) at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:683) at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3105) at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3051) at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2161) at org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:19) at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:44) at org.apache.spark.sql.types.DataType$.fromJson(dataTypes.scala:41) at org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675) at org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675) this is how I save parquet file with 1.1.1: sql(select 1 as a).saveAsParquetFile(/tmp/foo) and this is the meta data of the 1.1.1 parquet file: creator: parquet-mr version 1.4.3 extra: org.apache.spark.sql.parquet.row.metadata = StructType(List(StructField(a,IntegerType,false))) by comparison, this is 1.3.0 meta: creator: parquet-mr version 1.6.0rc3 extra: org.apache.spark.sql.parquet.row.metadata = {type:struct,fields:[{name:a,type:integer,nullable:t [more]... It looks like now ParquetRelation2 is used to load parquet file by default and it only recognizes JSON format schema but 1.1.1 schema was case class string format. Setting spark.sql.parquet.useDataSourceApi to false will fix it, but I don't know the differences. Is this considered a bug? We have a lot of parquet files from 1.1.1, should we disable data source api in order to read them if we want to upgrade to 1.3? Thanks, -- Pei-Lun
broadcast hang out
Hi all, i meet up with a problem that torrent broadcast hang out in my spark cluster (1.2, standalone) , particularly serious when driver and executors are cross-region. when i read the code of broadcast i found that a sync block read here: def fetchBlockSync(host: String, port: Int, execId: String, blockId: String): ManagedBuffer = { // A monitor for the thread to wait on. val result = Promise[ManagedBuffer]() fetchBlocks(host, port, execId, Array(blockId), new BlockFetchingListener { override def onBlockFetchFailure(blockId: String, exception: Throwable): Unit = { result.failure(exception) } override def onBlockFetchSuccess(blockId: String, data: ManagedBuffer): Unit = { val ret = ByteBuffer.allocate(data.size.toInt) ret.put(data.nioByteBuffer()) ret.flip() result.success(new NioManagedBuffer(ret)) } }) Await.result(result.future, Duration.Inf) } it seems that fetchBlockSync method does not have a timeout limit but wait forever ? Anybody can show me how to control the timeout here?
Re: broadcast hang out
can you just replace Duration.Inf with a shorter duration ? how about import scala.concurrent.duration._ val timeout = new Timeout(10 seconds) Await.result(result.future, timeout.duration) or val timeout = new FiniteDuration(10, TimeUnit.SECONDS) Await.result(result.future, timeout) or simply import scala.concurrent.duration._ Await.result(result.future, 10 seconds) On Sun, Mar 15, 2015 at 8:08 PM, lonely Feb lonely8...@gmail.com wrote: Hi all, i meet up with a problem that torrent broadcast hang out in my spark cluster (1.2, standalone) , particularly serious when driver and executors are cross-region. when i read the code of broadcast i found that a sync block read here: def fetchBlockSync(host: String, port: Int, execId: String, blockId: String): ManagedBuffer = { // A monitor for the thread to wait on. val result = Promise[ManagedBuffer]() fetchBlocks(host, port, execId, Array(blockId), new BlockFetchingListener { override def onBlockFetchFailure(blockId: String, exception: Throwable): Unit = { result.failure(exception) } override def onBlockFetchSuccess(blockId: String, data: ManagedBuffer): Unit = { val ret = ByteBuffer.allocate(data.size.toInt) ret.put(data.nioByteBuffer()) ret.flip() result.success(new NioManagedBuffer(ret)) } }) Await.result(result.future, Duration.Inf) } it seems that fetchBlockSync method does not have a timeout limit but wait forever ? Anybody can show me how to control the timeout here?
Re: broadcast hang out
Anyone can help? Thanks a lot ! 2015-03-16 11:45 GMT+08:00 lonely Feb lonely8...@gmail.com: yes 2015-03-16 11:43 GMT+08:00 Mridul Muralidharan mri...@gmail.com: Cross region as in different data centers ? - Mridul On Sun, Mar 15, 2015 at 8:08 PM, lonely Feb lonely8...@gmail.com wrote: Hi all, i meet up with a problem that torrent broadcast hang out in my spark cluster (1.2, standalone) , particularly serious when driver and executors are cross-region. when i read the code of broadcast i found that a sync block read here: def fetchBlockSync(host: String, port: Int, execId: String, blockId: String): ManagedBuffer = { // A monitor for the thread to wait on. val result = Promise[ManagedBuffer]() fetchBlocks(host, port, execId, Array(blockId), new BlockFetchingListener { override def onBlockFetchFailure(blockId: String, exception: Throwable): Unit = { result.failure(exception) } override def onBlockFetchSuccess(blockId: String, data: ManagedBuffer): Unit = { val ret = ByteBuffer.allocate(data.size.toInt) ret.put(data.nioByteBuffer()) ret.flip() result.success(new NioManagedBuffer(ret)) } }) Await.result(result.future, Duration.Inf) } it seems that fetchBlockSync method does not have a timeout limit but wait forever ? Anybody can show me how to control the timeout here?