Re: spark streaming 1.3 kafka topic error
Dears, I needs to commit DB Transaction for each partition,Not for each row. below didn't work for me. rdd.mapPartitions(partitionOfRecords = { DBConnectionInit() val results = partitionOfRecords.map(..) DBConnection.commit() }) Best regards, Ahmed Atef Nawwar Data Management Big Data Consultant On Thu, Aug 27, 2015 at 4:16 PM, Cody Koeninger c...@koeninger.org wrote: Your kafka broker died or you otherwise had a rebalance. Normally spark retries take care of that. Is there something going on with your kafka installation, that rebalance is taking especially long? Yes, increasing backoff / max number of retries will help, but it's better to figure out what's going on with kafka. On Wed, Aug 26, 2015 at 9:07 PM, Shushant Arora shushantaror...@gmail.com wrote: Hi My streaming application gets killed with below error 5/08/26 21:55:20 ERROR kafka.DirectKafkaInputDStream: ArrayBuffer(kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, org.apache.spark.SparkException: Couldn't find leader offsets for Set([testtopic,223], [testtopic,205], [testtopic,64], [testtopic,100], [testtopic,193])) 15/08/26 21:55:20 ERROR scheduler.JobScheduler: Error generating jobs for time 144062612 ms org.apache.spark.SparkException: ArrayBuffer(kafka.common.NotLeaderForPartitionException, org.apache.spark.SparkException: Couldn't find leader offsets for Set([testtopic,115])) at org.apache.spark.streaming.kafka.DirectKafkaInputDStream.latestLeaderOffsets(DirectKafkaInputDStream.scala:94) at org.apache.spark.streaming.kafka.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:116) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299) at Kafka params in job logs printed are : value.serializer = class org.apache.kafka.common.serialization.StringSerializer key.serializer = class org.apache.kafka.common.serialization.StringSerializer block.on.buffer.full = true retry.backoff.ms = 100 buffer.memory = 1048576 batch.size = 16384 metrics.sample.window.ms = 3 metadata.max.age.ms = 30 receive.buffer.bytes = 32768 timeout.ms = 3 max.in.flight.requests.per.connection = 5 bootstrap.servers = [broker1:9092, broker2:9092, broker3:9092] metric.reporters = [] client.id = compression.type = none retries = 0 max.request.size = 1048576 send.buffer.bytes = 131072 acks = all reconnect.backoff.ms = 10 linger.ms = 0 metrics.num.samples = 2 metadata.fetch.timeout.ms = 6 Is it kafka broker getting down and job is getting killed ? Whats the best way to handle it ? Increasing retries and backoff time wil help and to what values those should be set to never have streaming application failure - rather it keep on retrying after few seconds and send a event so that my custom code can send notification of kafka broker down if its because of that. Thanks
commit DB Transaction for each partition
Dears, I needs to commit DB Transaction for each partition,Not for each row. below didn't work for me. rdd.mapPartitions(partitionOfRecords = { DBConnectionInit() val results = partitionOfRecords.map(..) DBConnection.commit() results }) Best regards, Ahmed Atef Nawwar Data Management Big Data Consultant
Commit DB Transaction for each partition
Thanks for foreach idea. But once i used it i got empty rdd. I think because results is an iterator. Yes i know Map is lazy but i expected there is solution to force action. I can not use foreachPartition because i need reuse the new RDD after some maps. On Thu, Aug 27, 2015 at 5:11 PM, Cody Koeninger c...@koeninger.org wrote: Map is lazy. You need an actual action, or nothing will happen. Use foreachPartition, or do an empty foreach after the map. On Thu, Aug 27, 2015 at 8:53 AM, Ahmed Nawar ahmed.na...@gmail.com wrote: Dears, I needs to commit DB Transaction for each partition,Not for each row. below didn't work for me. rdd.mapPartitions(partitionOfRecords = { DBConnectionInit() val results = partitionOfRecords.map(..) DBConnection.commit() }) Best regards, Ahmed Atef Nawwar Data Management Big Data Consultant On Thu, Aug 27, 2015 at 4:16 PM, Cody Koeninger c...@koeninger.org wrote: Your kafka broker died or you otherwise had a rebalance. Normally spark retries take care of that. Is there something going on with your kafka installation, that rebalance is taking especially long? Yes, increasing backoff / max number of retries will help, but it's better to figure out what's going on with kafka. On Wed, Aug 26, 2015 at 9:07 PM, Shushant Arora shushantaror...@gmail.com wrote: Hi My streaming application gets killed with below error 5/08/26 21:55:20 ERROR kafka.DirectKafkaInputDStream: ArrayBuffer(kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, org.apache.spark.SparkException: Couldn't find leader offsets for Set([testtopic,223], [testtopic,205], [testtopic,64], [testtopic,100], [testtopic,193])) 15/08/26 21:55:20 ERROR scheduler.JobScheduler: Error generating jobs for time 144062612 ms org.apache.spark.SparkException: ArrayBuffer(kafka.common.NotLeaderForPartitionException, org.apache.spark.SparkException: Couldn't find leader offsets for Set([testtopic,115])) at org.apache.spark.streaming.kafka.DirectKafkaInputDStream.latestLeaderOffsets(DirectKafkaInputDStream.scala:94) at org.apache.spark.streaming.kafka.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:116) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299) at Kafka params in job logs printed are : value.serializer = class org.apache.kafka.common.serialization.StringSerializer key.serializer = class org.apache.kafka.common.serialization.StringSerializer block.on.buffer.full = true retry.backoff.ms = 100 buffer.memory = 1048576 batch.size = 16384 metrics.sample.window.ms = 3 metadata.max.age.ms = 30 receive.buffer.bytes = 32768 timeout.ms = 3 max.in.flight.requests.per.connection = 5 bootstrap.servers = [broker1:9092, broker2:9092, broker3:9092] metric.reporters = [] client.id = compression.type = none retries = 0 max.request.size = 1048576 send.buffer.bytes = 131072 acks = all reconnect.backoff.ms = 10 linger.ms = 0 metrics.num.samples = 2 metadata.fetch.timeout.ms = 6 Is it kafka broker getting down and job is getting killed ? Whats the best way to handle it ? Increasing retries and backoff time wil help and to what values those should be set to never have streaming application failure - rather it keep on retrying after few seconds and send a event so that my custom code can send notification of kafka broker down if its because of that. Thanks
Re: Commit DB Transaction for each partition
Yes, of course, I am doing that. But once i added results.foreach(row= {}) i pot empty RDD. rdd.mapPartitions(partitionOfRecords = { DBConnectionInit() val results = partitionOfRecords.map(..) DBConnection.commit() results.foreach(row= {}) results }) On Thu, Aug 27, 2015 at 10:18 PM, Cody Koeninger c...@koeninger.org wrote: You need to return an iterator from the closure you provide to mapPartitions On Thu, Aug 27, 2015 at 1:42 PM, Ahmed Nawar ahmed.na...@gmail.com wrote: Thanks for foreach idea. But once i used it i got empty rdd. I think because results is an iterator. Yes i know Map is lazy but i expected there is solution to force action. I can not use foreachPartition because i need reuse the new RDD after some maps. On Thu, Aug 27, 2015 at 5:11 PM, Cody Koeninger c...@koeninger.org wrote: Map is lazy. You need an actual action, or nothing will happen. Use foreachPartition, or do an empty foreach after the map. On Thu, Aug 27, 2015 at 8:53 AM, Ahmed Nawar ahmed.na...@gmail.com wrote: Dears, I needs to commit DB Transaction for each partition,Not for each row. below didn't work for me. rdd.mapPartitions(partitionOfRecords = { DBConnectionInit() val results = partitionOfRecords.map(..) DBConnection.commit() }) Best regards, Ahmed Atef Nawwar Data Management Big Data Consultant On Thu, Aug 27, 2015 at 4:16 PM, Cody Koeninger c...@koeninger.org wrote: Your kafka broker died or you otherwise had a rebalance. Normally spark retries take care of that. Is there something going on with your kafka installation, that rebalance is taking especially long? Yes, increasing backoff / max number of retries will help, but it's better to figure out what's going on with kafka. On Wed, Aug 26, 2015 at 9:07 PM, Shushant Arora shushantaror...@gmail.com wrote: Hi My streaming application gets killed with below error 5/08/26 21:55:20 ERROR kafka.DirectKafkaInputDStream: ArrayBuffer(kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, org.apache.spark.SparkException: Couldn't find leader offsets for Set([testtopic,223], [testtopic,205], [testtopic,64], [testtopic,100], [testtopic,193])) 15/08/26 21:55:20 ERROR scheduler.JobScheduler: Error generating jobs for time 144062612 ms org.apache.spark.SparkException: ArrayBuffer(kafka.common.NotLeaderForPartitionException, org.apache.spark.SparkException: Couldn't find leader offsets for Set([testtopic,115])) at org.apache.spark.streaming.kafka.DirectKafkaInputDStream.latestLeaderOffsets(DirectKafkaInputDStream.scala:94) at org.apache.spark.streaming.kafka.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:116) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299) at Kafka params in job logs printed are : value.serializer = class org.apache.kafka.common.serialization.StringSerializer key.serializer = class org.apache.kafka.common.serialization.StringSerializer block.on.buffer.full = true retry.backoff.ms = 100 buffer.memory = 1048576 batch.size = 16384 metrics.sample.window.ms = 3 metadata.max.age.ms = 30 receive.buffer.bytes = 32768 timeout.ms = 3 max.in.flight.requests.per.connection = 5 bootstrap.servers = [broker1:9092, broker2:9092, broker3:9092] metric.reporters = [] client.id = compression.type = none retries = 0 max.request.size = 1048576 send.buffer.bytes = 131072 acks = all reconnect.backoff.ms = 10 linger.ms = 0 metrics.num.samples = 2 metadata.fetch.timeout.ms = 6 Is it kafka broker getting down and job is getting killed ? Whats the best way to handle it ? Increasing retries and backoff time wil help and to what values those should be set to never have streaming application failure - rather it keep on retrying after few seconds and send a event so that my custom code can send notification of kafka broker down if its because of that. Thanks
Re: Commit DB Transaction for each partition
Thanks a lot for your support. It is working now. I wrote it like below val newRDD = rdd.mapPartitions { partition = { val result = partition.map(.) result } } newRDD.foreach { } On Thu, Aug 27, 2015 at 10:34 PM, Cody Koeninger c...@koeninger.org wrote: This job contains a spark output action, and is what I originally meant: rdd.mapPartitions { result }.foreach { } This job is just a transformation, and won't do anything unless you have another output action. Not to mention, it will exhaust the iterator, as you noticed: rdd.mapPartitions { result.foreach result } On Thu, Aug 27, 2015 at 2:22 PM, Ahmed Nawar ahmed.na...@gmail.com wrote: Yes, of course, I am doing that. But once i added results.foreach(row= {}) i pot empty RDD. rdd.mapPartitions(partitionOfRecords = { DBConnectionInit() val results = partitionOfRecords.map(..) DBConnection.commit() results.foreach(row= {}) results }) On Thu, Aug 27, 2015 at 10:18 PM, Cody Koeninger c...@koeninger.org wrote: You need to return an iterator from the closure you provide to mapPartitions On Thu, Aug 27, 2015 at 1:42 PM, Ahmed Nawar ahmed.na...@gmail.com wrote: Thanks for foreach idea. But once i used it i got empty rdd. I think because results is an iterator. Yes i know Map is lazy but i expected there is solution to force action. I can not use foreachPartition because i need reuse the new RDD after some maps. On Thu, Aug 27, 2015 at 5:11 PM, Cody Koeninger c...@koeninger.org wrote: Map is lazy. You need an actual action, or nothing will happen. Use foreachPartition, or do an empty foreach after the map. On Thu, Aug 27, 2015 at 8:53 AM, Ahmed Nawar ahmed.na...@gmail.com wrote: Dears, I needs to commit DB Transaction for each partition,Not for each row. below didn't work for me. rdd.mapPartitions(partitionOfRecords = { DBConnectionInit() val results = partitionOfRecords.map(..) DBConnection.commit() }) Best regards, Ahmed Atef Nawwar Data Management Big Data Consultant On Thu, Aug 27, 2015 at 4:16 PM, Cody Koeninger c...@koeninger.org wrote: Your kafka broker died or you otherwise had a rebalance. Normally spark retries take care of that. Is there something going on with your kafka installation, that rebalance is taking especially long? Yes, increasing backoff / max number of retries will help, but it's better to figure out what's going on with kafka. On Wed, Aug 26, 2015 at 9:07 PM, Shushant Arora shushantaror...@gmail.com wrote: Hi My streaming application gets killed with below error 5/08/26 21:55:20 ERROR kafka.DirectKafkaInputDStream: ArrayBuffer(kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, org.apache.spark.SparkException: Couldn't find leader offsets for Set([testtopic,223], [testtopic,205], [testtopic,64], [testtopic,100], [testtopic,193])) 15/08/26 21:55:20 ERROR scheduler.JobScheduler: Error generating jobs for time 144062612 ms org.apache.spark.SparkException: ArrayBuffer(kafka.common.NotLeaderForPartitionException, org.apache.spark.SparkException: Couldn't find leader offsets for Set([testtopic,115])) at org.apache.spark.streaming.kafka.DirectKafkaInputDStream.latestLeaderOffsets(DirectKafkaInputDStream.scala:94) at org.apache.spark.streaming.kafka.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:116) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299) at Kafka params in job logs printed are : value.serializer = class org.apache.kafka.common.serialization.StringSerializer key.serializer = class org.apache.kafka.common.serialization.StringSerializer block.on.buffer.full = true retry.backoff.ms = 100 buffer.memory = 1048576 batch.size = 16384 metrics.sample.window.ms = 3 metadata.max.age.ms = 30 receive.buffer.bytes = 32768 timeout.ms = 3 max.in.flight.requests.per.connection = 5 bootstrap.servers = [broker1:9092, broker2:9092, broker3:9092] metric.reporters = [] client.id = compression.type = none retries = 0 max.request.size = 1048576 send.buffer.bytes = 131072 acks = all reconnect.backoff.ms = 10 linger.ms = 0 metrics.num.samples = 2 metadata.fetch.timeout.ms = 6 Is it kafka broker getting down and job is getting killed ? Whats
Re: Data/File structure Validation
Dear Taotao, Yes, I tried sparkCSV. Thanks, Nawwar On Mon, Mar 23, 2015 at 12:20 PM, Taotao.Li taotao...@datayes.com wrote: can it load successfully if the format is invalid? -- *发件人: *Ahmed Nawar ahmed.na...@gmail.com *收件人: *user@spark.apache.org *发送时间: *星期一, 2015年 3 月 23日 下午 4:48:54 *主题: *Data/File structure Validation Dears, Is there any way to validate the CSV, Json ... Files while loading to DataFrame. I need to ignore corrupted rows.(Rows with not matching with the schema). Thanks, Ahmed Nawwar -- *---* *Thanks Best regards* 李涛涛 Taotao · Li | Fixed Income@Datayes | Software Engineer 地址:上海市浦东新区陆家嘴西路99号万向大厦8楼, 200120 Address :Wanxiang Towen 8F, Lujiazui West Rd. No.99, Pudong New District, Shanghai, 200120 电话|Phone:021-60216502 手机|Mobile: +86-18202171279
Re: Data/File structure Validation
Dear Raunak, Source system provided logs with some errors. I need to make sure each row is in correct format (number of columns/ attributes and data types is correct) and move incorrect Rows to separated List. Of course i can do my logic but i need to make sure there is no direct way. Thanks, Nawwar On Mon, Mar 23, 2015 at 1:14 PM, Raunak Jhawar raunak.jha...@gmail.com wrote: CSV is a structured format and JSON is not (semi structured). It is obvious for different JSON documents to have differing schema? What are you trying to do here? -- Thanks, Raunak Jhawar m: 09820890034 On Mon, Mar 23, 2015 at 2:18 PM, Ahmed Nawar ahmed.na...@gmail.com wrote: Dears, Is there any way to validate the CSV, Json ... Files while loading to DataFrame. I need to ignore corrupted rows.(Rows with not matching with the schema). Thanks, Ahmed Nawwar
Data/File structure Validation
Dears, Is there any way to validate the CSV, Json ... Files while loading to DataFrame. I need to ignore corrupted rows.(Rows with not matching with the schema). Thanks, Ahmed Nawwar
Re: Any IRC channel on Spark?
Dears, Is there any instructions to build spark 1.3.0 on windows 7. I tried mvn -Phive -Phive-thriftserver -DskipTests clean package but i got below errors [INFO] Spark Project Parent POM ... SUCCESS [ 7.845 s] [INFO] Spark Project Networking ... SUCCESS [ 26.209 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 9.701 s] [INFO] Spark Project Core . SUCCESS [04:29 min] [INFO] Spark Project Bagel SUCCESS [ 22.215 s] [INFO] Spark Project GraphX ... SUCCESS [ 59.676 s] [INFO] Spark Project Streaming SUCCESS [01:46 min] [INFO] Spark Project Catalyst . SUCCESS [01:40 min] [INFO] Spark Project SQL .. SUCCESS [03:05 min] [INFO] Spark Project ML Library ... FAILURE [03:49 min] [INFO] Spark Project Tools SKIPPED [INFO] Spark Project Hive . SKIPPED [INFO] Spark Project REPL . SKIPPED [INFO] Spark Project Hive Thrift Server ... SKIPPED [INFO] Spark Project Assembly . SKIPPED [INFO] Spark Project External Twitter . SKIPPED [INFO] Spark Project External Flume Sink .. SKIPPED [INFO] Spark Project External Flume ... SKIPPED [INFO] Spark Project External MQTT SKIPPED [INFO] Spark Project External ZeroMQ .. SKIPPED [INFO] Spark Project External Kafka ... SKIPPED [INFO] Spark Project Examples . SKIPPED [INFO] Spark Project External Kafka Assembly .. SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 16:58 min [INFO] Finished at: 2015-03-17T11:04:40+03:00 [INFO] Final Memory: 77M/1840M [INFO] [ERROR] Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project spark-mllib_2.10: Failed during scalastyle exe p 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :spark-mllib_2.10 On Tue, Mar 17, 2015 at 10:06 AM, Akhil Das ak...@sigmoidanalytics.com wrote: There's one on Freenode, You can join #Apache-Spark There's like 60 people idling. :) Thanks Best Regards On Mon, Mar 16, 2015 at 10:46 PM, Feng Lin lfliu.x...@gmail.com wrote: Hi, everyone, I'm wondering whether there is a possibility to setup an official IRC channel on freenode. I noticed that a lot of apache projects would have a such channel to let people talk directly. Best Michael
build spark 1.3.0 on windows 7.
Sorry for old subject i am correcting it. On Tue, Mar 17, 2015 at 11:47 AM, Ahmed Nawar ahmed.na...@gmail.com wrote: Dears, Is there any instructions to build spark 1.3.0 on windows 7. I tried mvn -Phive -Phive-thriftserver -DskipTests clean package but i got below errors [INFO] Spark Project Parent POM ... SUCCESS [ 7.845 s] [INFO] Spark Project Networking ... SUCCESS [ 26.209 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 9.701 s] [INFO] Spark Project Core . SUCCESS [04:29 min] [INFO] Spark Project Bagel SUCCESS [ 22.215 s] [INFO] Spark Project GraphX ... SUCCESS [ 59.676 s] [INFO] Spark Project Streaming SUCCESS [01:46 min] [INFO] Spark Project Catalyst . SUCCESS [01:40 min] [INFO] Spark Project SQL .. SUCCESS [03:05 min] [INFO] Spark Project ML Library ... FAILURE [03:49 min] [INFO] Spark Project Tools SKIPPED [INFO] Spark Project Hive . SKIPPED [INFO] Spark Project REPL . SKIPPED [INFO] Spark Project Hive Thrift Server ... SKIPPED [INFO] Spark Project Assembly . SKIPPED [INFO] Spark Project External Twitter . SKIPPED [INFO] Spark Project External Flume Sink .. SKIPPED [INFO] Spark Project External Flume ... SKIPPED [INFO] Spark Project External MQTT SKIPPED [INFO] Spark Project External ZeroMQ .. SKIPPED [INFO] Spark Project External Kafka ... SKIPPED [INFO] Spark Project Examples . SKIPPED [INFO] Spark Project External Kafka Assembly .. SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 16:58 min [INFO] Finished at: 2015-03-17T11:04:40+03:00 [INFO] Final Memory: 77M/1840M [INFO] [ERROR] Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project spark-mllib_2.10: Failed during scalastyle exe p 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :spark-mllib_2.10
Re: Building Spark on Windows WAS: Any IRC channel on Spark?
Scalastyle violation(s). at org.scalastyle.maven.plugin.ScalastyleViolationCheckMojo.performCheck(ScalastyleViolationCheckMojo.java:230) ... 22 more [ERROR] [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :spark-mllib_2.10 C:\Nawwar\Hadoop\spark\spark-1.3.0mvn -X -Phive -Phive-thriftserver -DskipTests clean package On Tue, Mar 17, 2015 at 12:14 PM, Ted Yu yuzhih...@gmail.com wrote: Have you tried with -X switch ? Thanks On Mar 17, 2015, at 1:47 AM, Ahmed Nawar ahmed.na...@gmail.com wrote: Dears, Is there any instructions to build spark 1.3.0 on windows 7. I tried mvn -Phive -Phive-thriftserver -DskipTests clean package but i got below errors [INFO] Spark Project Parent POM ... SUCCESS [ 7.845 s] [INFO] Spark Project Networking ... SUCCESS [ 26.209 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 9.701 s] [INFO] Spark Project Core . SUCCESS [04:29 min] [INFO] Spark Project Bagel SUCCESS [ 22.215 s] [INFO] Spark Project GraphX ... SUCCESS [ 59.676 s] [INFO] Spark Project Streaming SUCCESS [01:46 min] [INFO] Spark Project Catalyst . SUCCESS [01:40 min] [INFO] Spark Project SQL .. SUCCESS [03:05 min] [INFO] Spark Project ML Library ... FAILURE [03:49 min] [INFO] Spark Project Tools SKIPPED [INFO] Spark Project Hive . SKIPPED [INFO] Spark Project REPL . SKIPPED [INFO] Spark Project Hive Thrift Server ... SKIPPED [INFO] Spark Project Assembly . SKIPPED [INFO] Spark Project External Twitter . SKIPPED [INFO] Spark Project External Flume Sink .. SKIPPED [INFO] Spark Project External Flume ... SKIPPED [INFO] Spark Project External MQTT SKIPPED [INFO] Spark Project External ZeroMQ .. SKIPPED [INFO] Spark Project External Kafka ... SKIPPED [INFO] Spark Project Examples . SKIPPED [INFO] Spark Project External Kafka Assembly .. SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 16:58 min [INFO] Finished at: 2015-03-17T11:04:40+03:00 [INFO] Final Memory: 77M/1840M [INFO] [ERROR] Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project spark-mllib_2.10: Failed during scalastyle exe p 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :spark-mllib_2.10 On Tue, Mar 17, 2015 at 10:06 AM, Akhil Das ak...@sigmoidanalytics.com wrote: There's one on Freenode, You can join #Apache-Spark There's like 60 people idling. :) Thanks Best Regards On Mon, Mar 16, 2015 at 10:46 PM, Feng Lin lfliu.x...@gmail.com wrote: Hi, everyone, I'm wondering whether there is a possibility to setup an official IRC channel on freenode. I noticed that a lot of apache projects would have a such channel to let people talk directly. Best Michael
Re: Building Spark on Windows WAS: Any IRC channel on Spark?
Dear Yu, Are you mean scalastyle-output.xml? i coped its content below ?xml version=1.0 encoding=UTF-8? checkstyle version=5.0 file name=C:\Nawwar\Hadoop\spark\spark-1.3.0\mllib\src\main\scala\org\apache\spark\mllib\clustering\LDAModel.scala error severity=error message=Input length = 1/error /file /checkstyle On Tue, Mar 17, 2015 at 4:11 PM, Ted Yu yuzhih...@gmail.com wrote: Can you look in build output for scalastyle warning in mllib module ? Cheers On Mar 17, 2015, at 3:00 AM, Ahmed Nawar ahmed.na...@gmail.com wrote: Dear Yu, With -X i got below error. [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM ... SUCCESS [ 7.418 s] [INFO] Spark Project Networking ... SUCCESS [ 16.551 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 10.392 s] [INFO] Spark Project Core . SUCCESS [04:26 min] [INFO] Spark Project Bagel SUCCESS [ 23.876 s] [INFO] Spark Project GraphX ... SUCCESS [01:02 min] [INFO] Spark Project Streaming SUCCESS [01:46 min] [INFO] Spark Project Catalyst . SUCCESS [01:45 min] [INFO] Spark Project SQL .. SUCCESS [02:16 min] [INFO] Spark Project ML Library ... FAILURE [02:38 min] [INFO] Spark Project Tools SKIPPED [INFO] Spark Project Hive . SKIPPED [INFO] Spark Project REPL . SKIPPED [INFO] Spark Project Hive Thrift Server ... SKIPPED [INFO] Spark Project Assembly . SKIPPED [INFO] Spark Project External Twitter . SKIPPED [INFO] Spark Project External Flume Sink .. SKIPPED [INFO] Spark Project External Flume ... SKIPPED [INFO] Spark Project External MQTT SKIPPED [INFO] Spark Project External ZeroMQ .. SKIPPED [INFO] Spark Project External Kafka ... SKIPPED [INFO] Spark Project Examples . SKIPPED [INFO] Spark Project External Kafka Assembly .. SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 14:54 min [INFO] Finished at: 2015-03-17T12:54:19+03:00 [INFO] Final Memory: 76M/1702M [INFO] [ERROR] Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project spark-mllib_2.10: Failed during scalastyle execution: You have 1 Scalastyle violation(s). - [Hel p 1] org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project spark-mllib_2.10: Failed during scalastyle execut ion at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:216) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80) at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155) at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584) at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216) at org.apache.maven.cli.MavenCli.main(MavenCli.java:160) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) at org.codehaus.plexus.classworlds.launcher.Launcher.main