Re: spark streaming 1.3 kafka topic error

2015-08-27 Thread Ahmed Nawar
Dears,

I needs to commit DB Transaction for each partition,Not for each row.
below didn't work for me.


rdd.mapPartitions(partitionOfRecords = {

DBConnectionInit()

val results = partitionOfRecords.map(..)

DBConnection.commit()


})



Best regards,

Ahmed Atef Nawwar

Data Management  Big Data Consultant






On Thu, Aug 27, 2015 at 4:16 PM, Cody Koeninger c...@koeninger.org wrote:

 Your kafka broker died or you otherwise had a rebalance.

 Normally spark retries take care of that.

 Is there something going on with your kafka installation, that rebalance
 is taking especially long?

 Yes, increasing backoff / max number of retries will help, but it's
 better to figure out what's going on with kafka.

 On Wed, Aug 26, 2015 at 9:07 PM, Shushant Arora shushantaror...@gmail.com
  wrote:

 Hi

 My streaming application gets killed with below error

 5/08/26 21:55:20 ERROR kafka.DirectKafkaInputDStream:
 ArrayBuffer(kafka.common.NotLeaderForPartitionException,
 kafka.common.NotLeaderForPartitionException,
 kafka.common.NotLeaderForPartitionException,
 kafka.common.NotLeaderForPartitionException,
 kafka.common.NotLeaderForPartitionException,
 org.apache.spark.SparkException: Couldn't find leader offsets for
 Set([testtopic,223], [testtopic,205], [testtopic,64], [testtopic,100],
 [testtopic,193]))
 15/08/26 21:55:20 ERROR scheduler.JobScheduler: Error generating jobs for
 time 144062612 ms
 org.apache.spark.SparkException:
 ArrayBuffer(kafka.common.NotLeaderForPartitionException,
 org.apache.spark.SparkException: Couldn't find leader offsets for
 Set([testtopic,115]))
 at
 org.apache.spark.streaming.kafka.DirectKafkaInputDStream.latestLeaderOffsets(DirectKafkaInputDStream.scala:94)
 at
 org.apache.spark.streaming.kafka.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:116)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
 at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299)
 at



 Kafka params in job logs printed are :
  value.serializer = class
 org.apache.kafka.common.serialization.StringSerializer
 key.serializer = class
 org.apache.kafka.common.serialization.StringSerializer
 block.on.buffer.full = true
 retry.backoff.ms = 100
 buffer.memory = 1048576
 batch.size = 16384
 metrics.sample.window.ms = 3
 metadata.max.age.ms = 30
 receive.buffer.bytes = 32768
 timeout.ms = 3
 max.in.flight.requests.per.connection = 5
 bootstrap.servers = [broker1:9092, broker2:9092, broker3:9092]
 metric.reporters = []
 client.id =
 compression.type = none
 retries = 0
 max.request.size = 1048576
 send.buffer.bytes = 131072
 acks = all
 reconnect.backoff.ms = 10
 linger.ms = 0
 metrics.num.samples = 2
 metadata.fetch.timeout.ms = 6


 Is it kafka broker getting down and job is getting killed ? Whats the
 best way to handle it ?
 Increasing retries and backoff time  wil help and to what values those
 should be set to never have streaming application failure - rather it keep
 on retrying after few seconds and send a event so that my custom code can
 send notification of kafka broker down if its because of that.


 Thanks





commit DB Transaction for each partition

2015-08-27 Thread Ahmed Nawar
Dears,

I needs to commit DB Transaction for each partition,Not for each row.
below didn't work for me.


rdd.mapPartitions(partitionOfRecords = {

DBConnectionInit()

val results = partitionOfRecords.map(..)

DBConnection.commit()

results

})



Best regards,

Ahmed Atef Nawwar

Data Management  Big Data Consultant


Commit DB Transaction for each partition

2015-08-27 Thread Ahmed Nawar
Thanks for foreach idea. But once i used it i got empty rdd. I think
because results is an iterator.

Yes i know Map is lazy but i expected there is solution to force action.

I can not use foreachPartition because i need reuse the new RDD after some
maps.



On Thu, Aug 27, 2015 at 5:11 PM, Cody Koeninger c...@koeninger.org wrote:


 Map is lazy.  You need an actual action, or nothing will happen.  Use
 foreachPartition, or do an empty foreach after the map.

 On Thu, Aug 27, 2015 at 8:53 AM, Ahmed Nawar ahmed.na...@gmail.com
 wrote:

 Dears,

 I needs to commit DB Transaction for each partition,Not for each row.
 below didn't work for me.


 rdd.mapPartitions(partitionOfRecords = {

 DBConnectionInit()

 val results = partitionOfRecords.map(..)

 DBConnection.commit()


 })



 Best regards,

 Ahmed Atef Nawwar

 Data Management  Big Data Consultant






 On Thu, Aug 27, 2015 at 4:16 PM, Cody Koeninger c...@koeninger.org
 wrote:

 Your kafka broker died or you otherwise had a rebalance.

 Normally spark retries take care of that.

 Is there something going on with your kafka installation, that rebalance
 is taking especially long?

 Yes, increasing backoff / max number of retries will help, but it's
 better to figure out what's going on with kafka.

 On Wed, Aug 26, 2015 at 9:07 PM, Shushant Arora 
 shushantaror...@gmail.com wrote:

 Hi

 My streaming application gets killed with below error

 5/08/26 21:55:20 ERROR kafka.DirectKafkaInputDStream:
 ArrayBuffer(kafka.common.NotLeaderForPartitionException,
 kafka.common.NotLeaderForPartitionException,
 kafka.common.NotLeaderForPartitionException,
 kafka.common.NotLeaderForPartitionException,
 kafka.common.NotLeaderForPartitionException,
 org.apache.spark.SparkException: Couldn't find leader offsets for
 Set([testtopic,223], [testtopic,205], [testtopic,64], [testtopic,100],
 [testtopic,193]))
 15/08/26 21:55:20 ERROR scheduler.JobScheduler: Error generating jobs
 for time 144062612 ms
 org.apache.spark.SparkException:
 ArrayBuffer(kafka.common.NotLeaderForPartitionException,
 org.apache.spark.SparkException: Couldn't find leader offsets for
 Set([testtopic,115]))
 at
 org.apache.spark.streaming.kafka.DirectKafkaInputDStream.latestLeaderOffsets(DirectKafkaInputDStream.scala:94)
 at
 org.apache.spark.streaming.kafka.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:116)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
 at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299)
 at



 Kafka params in job logs printed are :
  value.serializer = class
 org.apache.kafka.common.serialization.StringSerializer
 key.serializer = class
 org.apache.kafka.common.serialization.StringSerializer
 block.on.buffer.full = true
 retry.backoff.ms = 100
 buffer.memory = 1048576
 batch.size = 16384
 metrics.sample.window.ms = 3
 metadata.max.age.ms = 30
 receive.buffer.bytes = 32768
 timeout.ms = 3
 max.in.flight.requests.per.connection = 5
 bootstrap.servers = [broker1:9092, broker2:9092, broker3:9092]
 metric.reporters = []
 client.id =
 compression.type = none
 retries = 0
 max.request.size = 1048576
 send.buffer.bytes = 131072
 acks = all
 reconnect.backoff.ms = 10
 linger.ms = 0
 metrics.num.samples = 2
 metadata.fetch.timeout.ms = 6


 Is it kafka broker getting down and job is getting killed ? Whats the
 best way to handle it ?
 Increasing retries and backoff time  wil help and to what values those
 should be set to never have streaming application failure - rather it keep
 on retrying after few seconds and send a event so that my custom code can
 send notification of kafka broker down if its because of that.


 Thanks







Re: Commit DB Transaction for each partition

2015-08-27 Thread Ahmed Nawar
Yes, of course, I am doing that. But once i added results.foreach(row= {})
  i pot empty RDD.



rdd.mapPartitions(partitionOfRecords = {

DBConnectionInit()

val results = partitionOfRecords.map(..)

DBConnection.commit()

results.foreach(row= {})

results

})



On Thu, Aug 27, 2015 at 10:18 PM, Cody Koeninger c...@koeninger.org wrote:

 You need to return an iterator from the closure you provide to
 mapPartitions

 On Thu, Aug 27, 2015 at 1:42 PM, Ahmed Nawar ahmed.na...@gmail.com
 wrote:

 Thanks for foreach idea. But once i used it i got empty rdd. I think
 because results is an iterator.

 Yes i know Map is lazy but i expected there is solution to force action.

 I can not use foreachPartition because i need reuse the new RDD after
 some maps.



 On Thu, Aug 27, 2015 at 5:11 PM, Cody Koeninger c...@koeninger.org
 wrote:


 Map is lazy.  You need an actual action, or nothing will happen.  Use
 foreachPartition, or do an empty foreach after the map.

 On Thu, Aug 27, 2015 at 8:53 AM, Ahmed Nawar ahmed.na...@gmail.com
 wrote:

 Dears,

 I needs to commit DB Transaction for each partition,Not for each
 row. below didn't work for me.


 rdd.mapPartitions(partitionOfRecords = {

 DBConnectionInit()

 val results = partitionOfRecords.map(..)

 DBConnection.commit()


 })



 Best regards,

 Ahmed Atef Nawwar

 Data Management  Big Data Consultant






 On Thu, Aug 27, 2015 at 4:16 PM, Cody Koeninger c...@koeninger.org
 wrote:

 Your kafka broker died or you otherwise had a rebalance.

 Normally spark retries take care of that.

 Is there something going on with your kafka installation, that
 rebalance is taking especially long?

 Yes, increasing backoff / max number of retries will help, but it's
 better to figure out what's going on with kafka.

 On Wed, Aug 26, 2015 at 9:07 PM, Shushant Arora 
 shushantaror...@gmail.com wrote:

 Hi

 My streaming application gets killed with below error

 5/08/26 21:55:20 ERROR kafka.DirectKafkaInputDStream:
 ArrayBuffer(kafka.common.NotLeaderForPartitionException,
 kafka.common.NotLeaderForPartitionException,
 kafka.common.NotLeaderForPartitionException,
 kafka.common.NotLeaderForPartitionException,
 kafka.common.NotLeaderForPartitionException,
 org.apache.spark.SparkException: Couldn't find leader offsets for
 Set([testtopic,223], [testtopic,205], [testtopic,64], [testtopic,100],
 [testtopic,193]))
 15/08/26 21:55:20 ERROR scheduler.JobScheduler: Error generating jobs
 for time 144062612 ms
 org.apache.spark.SparkException:
 ArrayBuffer(kafka.common.NotLeaderForPartitionException,
 org.apache.spark.SparkException: Couldn't find leader offsets for
 Set([testtopic,115]))
 at
 org.apache.spark.streaming.kafka.DirectKafkaInputDStream.latestLeaderOffsets(DirectKafkaInputDStream.scala:94)
 at
 org.apache.spark.streaming.kafka.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:116)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
 at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299)
 at



 Kafka params in job logs printed are :
  value.serializer = class
 org.apache.kafka.common.serialization.StringSerializer
 key.serializer = class
 org.apache.kafka.common.serialization.StringSerializer
 block.on.buffer.full = true
 retry.backoff.ms = 100
 buffer.memory = 1048576
 batch.size = 16384
 metrics.sample.window.ms = 3
 metadata.max.age.ms = 30
 receive.buffer.bytes = 32768
 timeout.ms = 3
 max.in.flight.requests.per.connection = 5
 bootstrap.servers = [broker1:9092, broker2:9092, broker3:9092]
 metric.reporters = []
 client.id =
 compression.type = none
 retries = 0
 max.request.size = 1048576
 send.buffer.bytes = 131072
 acks = all
 reconnect.backoff.ms = 10
 linger.ms = 0
 metrics.num.samples = 2
 metadata.fetch.timeout.ms = 6


 Is it kafka broker getting down and job is getting killed ? Whats the
 best way to handle it ?
 Increasing retries and backoff time  wil help and to what values
 those should be set to never have streaming application failure - rather 
 it
 keep on retrying after few seconds and send a event so that my custom 
 code
 can send notification of kafka broker down if its because of that.


 Thanks









Re: Commit DB Transaction for each partition

2015-08-27 Thread Ahmed Nawar
Thanks a lot for your support. It is working now.
I wrote it like below


val newRDD = rdd.mapPartitions { partition = {

  val result = partition.map(.)

  result
}
}

newRDD.foreach {

}


On Thu, Aug 27, 2015 at 10:34 PM, Cody Koeninger c...@koeninger.org wrote:

 This job contains a spark output action, and is what I originally meant:


 rdd.mapPartitions {
   result
 }.foreach {

 }

 This job is just a transformation, and won't do anything unless you have
 another output action.  Not to mention, it will exhaust the iterator, as
 you noticed:

 rdd.mapPartitions {
   result.foreach
   result
 }



 On Thu, Aug 27, 2015 at 2:22 PM, Ahmed Nawar ahmed.na...@gmail.com
 wrote:

 Yes, of course, I am doing that. But once i added results.foreach(row=
 {})   i pot empty RDD.



 rdd.mapPartitions(partitionOfRecords = {

 DBConnectionInit()

 val results = partitionOfRecords.map(..)

 DBConnection.commit()

 results.foreach(row= {})

 results

 })



 On Thu, Aug 27, 2015 at 10:18 PM, Cody Koeninger c...@koeninger.org
 wrote:

 You need to return an iterator from the closure you provide to
 mapPartitions

 On Thu, Aug 27, 2015 at 1:42 PM, Ahmed Nawar ahmed.na...@gmail.com
 wrote:

 Thanks for foreach idea. But once i used it i got empty rdd. I think
 because results is an iterator.

 Yes i know Map is lazy but i expected there is solution to force
 action.

 I can not use foreachPartition because i need reuse the new RDD after
 some maps.



 On Thu, Aug 27, 2015 at 5:11 PM, Cody Koeninger c...@koeninger.org
 wrote:


 Map is lazy.  You need an actual action, or nothing will happen.  Use
 foreachPartition, or do an empty foreach after the map.

 On Thu, Aug 27, 2015 at 8:53 AM, Ahmed Nawar ahmed.na...@gmail.com
 wrote:

 Dears,

 I needs to commit DB Transaction for each partition,Not for each
 row. below didn't work for me.


 rdd.mapPartitions(partitionOfRecords = {

 DBConnectionInit()

 val results = partitionOfRecords.map(..)

 DBConnection.commit()


 })



 Best regards,

 Ahmed Atef Nawwar

 Data Management  Big Data Consultant






 On Thu, Aug 27, 2015 at 4:16 PM, Cody Koeninger c...@koeninger.org
 wrote:

 Your kafka broker died or you otherwise had a rebalance.

 Normally spark retries take care of that.

 Is there something going on with your kafka installation, that
 rebalance is taking especially long?

 Yes, increasing backoff / max number of retries will help, but
 it's better to figure out what's going on with kafka.

 On Wed, Aug 26, 2015 at 9:07 PM, Shushant Arora 
 shushantaror...@gmail.com wrote:

 Hi

 My streaming application gets killed with below error

 5/08/26 21:55:20 ERROR kafka.DirectKafkaInputDStream:
 ArrayBuffer(kafka.common.NotLeaderForPartitionException,
 kafka.common.NotLeaderForPartitionException,
 kafka.common.NotLeaderForPartitionException,
 kafka.common.NotLeaderForPartitionException,
 kafka.common.NotLeaderForPartitionException,
 org.apache.spark.SparkException: Couldn't find leader offsets for
 Set([testtopic,223], [testtopic,205], [testtopic,64], [testtopic,100],
 [testtopic,193]))
 15/08/26 21:55:20 ERROR scheduler.JobScheduler: Error generating
 jobs for time 144062612 ms
 org.apache.spark.SparkException:
 ArrayBuffer(kafka.common.NotLeaderForPartitionException,
 org.apache.spark.SparkException: Couldn't find leader offsets for
 Set([testtopic,115]))
 at
 org.apache.spark.streaming.kafka.DirectKafkaInputDStream.latestLeaderOffsets(DirectKafkaInputDStream.scala:94)
 at
 org.apache.spark.streaming.kafka.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:116)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:300)
 at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
 at
 org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:299)
 at



 Kafka params in job logs printed are :
  value.serializer = class
 org.apache.kafka.common.serialization.StringSerializer
 key.serializer = class
 org.apache.kafka.common.serialization.StringSerializer
 block.on.buffer.full = true
 retry.backoff.ms = 100
 buffer.memory = 1048576
 batch.size = 16384
 metrics.sample.window.ms = 3
 metadata.max.age.ms = 30
 receive.buffer.bytes = 32768
 timeout.ms = 3
 max.in.flight.requests.per.connection = 5
 bootstrap.servers = [broker1:9092, broker2:9092,
 broker3:9092]
 metric.reporters = []
 client.id =
 compression.type = none
 retries = 0
 max.request.size = 1048576
 send.buffer.bytes = 131072
 acks = all
 reconnect.backoff.ms = 10
 linger.ms = 0
 metrics.num.samples = 2
 metadata.fetch.timeout.ms = 6


 Is it kafka broker getting down and job is getting killed ? Whats

Re: Data/File structure Validation

2015-03-23 Thread Ahmed Nawar
Dear Taotao,

Yes, I tried sparkCSV.


Thanks,
Nawwar


On Mon, Mar 23, 2015 at 12:20 PM, Taotao.Li taotao...@datayes.com wrote:

 can it load successfully if the format is invalid?

 --
 *发件人: *Ahmed Nawar ahmed.na...@gmail.com
 *收件人: *user@spark.apache.org
 *发送时间: *星期一, 2015年 3 月 23日 下午 4:48:54
 *主题: *Data/File structure Validation

 Dears,

 Is there any way to validate the CSV, Json ... Files while loading to
 DataFrame.
 I need to ignore corrupted rows.(Rows with not matching with the
 schema).


 Thanks,
 Ahmed Nawwar



 --


 *---*

 *Thanks  Best regards*

 李涛涛 Taotao · Li  |  Fixed Income@Datayes  |  Software Engineer

 地址:上海市浦东新区陆家嘴西路99号万向大厦8楼, 200120
 Address :Wanxiang Towen 8F, Lujiazui West Rd. No.99, Pudong New District,
 Shanghai, 200120

 电话|Phone:021-60216502  手机|Mobile: +86-18202171279




Re: Data/File structure Validation

2015-03-23 Thread Ahmed Nawar
Dear Raunak,

   Source system provided logs with some errors. I need to make sure each
row is in correct format (number of columns/ attributes and data types is
correct) and move incorrect Rows to separated List.

Of course i can do my logic but i need to make sure there is no direct way.


Thanks,
Nawwar


On Mon, Mar 23, 2015 at 1:14 PM, Raunak Jhawar raunak.jha...@gmail.com
wrote:

 CSV is a structured format and JSON is not (semi structured). It is
 obvious for different JSON documents to have differing schema? What are you
 trying to do here?

 --
 Thanks,
 Raunak Jhawar
 m: 09820890034






 On Mon, Mar 23, 2015 at 2:18 PM, Ahmed Nawar ahmed.na...@gmail.com
 wrote:

 Dears,

 Is there any way to validate the CSV, Json ... Files while loading to
 DataFrame.
 I need to ignore corrupted rows.(Rows with not matching with the
 schema).


 Thanks,
 Ahmed Nawwar





Data/File structure Validation

2015-03-23 Thread Ahmed Nawar
Dears,

Is there any way to validate the CSV, Json ... Files while loading to
DataFrame.
I need to ignore corrupted rows.(Rows with not matching with the
schema).


Thanks,
Ahmed Nawwar


Re: Any IRC channel on Spark?

2015-03-17 Thread Ahmed Nawar
Dears,

Is there any instructions to build spark 1.3.0 on windows 7.

I tried mvn -Phive -Phive-thriftserver -DskipTests clean package but
i got below errors


[INFO] Spark Project Parent POM ... SUCCESS [
 7.845 s]
[INFO] Spark Project Networking ... SUCCESS [
26.209 s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [
 9.701 s]
[INFO] Spark Project Core . SUCCESS [04:29
min]
[INFO] Spark Project Bagel  SUCCESS [
22.215 s]
[INFO] Spark Project GraphX ... SUCCESS [
59.676 s]
[INFO] Spark Project Streaming  SUCCESS [01:46
min]
[INFO] Spark Project Catalyst . SUCCESS [01:40
min]
[INFO] Spark Project SQL .. SUCCESS [03:05
min]
[INFO] Spark Project ML Library ... FAILURE [03:49
min]
[INFO] Spark Project Tools  SKIPPED
[INFO] Spark Project Hive . SKIPPED
[INFO] Spark Project REPL . SKIPPED
[INFO] Spark Project Hive Thrift Server ... SKIPPED
[INFO] Spark Project Assembly . SKIPPED
[INFO] Spark Project External Twitter . SKIPPED
[INFO] Spark Project External Flume Sink .. SKIPPED
[INFO] Spark Project External Flume ... SKIPPED
[INFO] Spark Project External MQTT  SKIPPED
[INFO] Spark Project External ZeroMQ .. SKIPPED
[INFO] Spark Project External Kafka ... SKIPPED
[INFO] Spark Project Examples . SKIPPED
[INFO] Spark Project External Kafka Assembly .. SKIPPED
[INFO]

[INFO] BUILD FAILURE
[INFO]

[INFO] Total time: 16:58 min
[INFO] Finished at: 2015-03-17T11:04:40+03:00
[INFO] Final Memory: 77M/1840M
[INFO]

[ERROR] Failed to execute goal
org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project
spark-mllib_2.10: Failed during scalastyle exe
p 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions,
please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the
command
[ERROR]   mvn goals -rf :spark-mllib_2.10








On Tue, Mar 17, 2015 at 10:06 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 There's one on Freenode, You can join #Apache-Spark There's like 60 people
 idling. :)

 Thanks
 Best Regards

 On Mon, Mar 16, 2015 at 10:46 PM, Feng Lin lfliu.x...@gmail.com wrote:

 Hi, everyone,
 I'm wondering whether there is a possibility to setup an official IRC
 channel on freenode.

 I noticed that a lot of apache projects would have a such channel to let
 people talk directly.

 Best
 Michael





build spark 1.3.0 on windows 7.

2015-03-17 Thread Ahmed Nawar
Sorry for old subject i am correcting it.

On Tue, Mar 17, 2015 at 11:47 AM, Ahmed Nawar ahmed.na...@gmail.com wrote:

 Dears,

 Is there any instructions to build spark 1.3.0 on windows 7.

 I tried mvn -Phive -Phive-thriftserver -DskipTests clean package but
 i got below errors


 [INFO] Spark Project Parent POM ... SUCCESS [
  7.845 s]
 [INFO] Spark Project Networking ... SUCCESS [
 26.209 s]
 [INFO] Spark Project Shuffle Streaming Service  SUCCESS [
  9.701 s]
 [INFO] Spark Project Core . SUCCESS [04:29
 min]
 [INFO] Spark Project Bagel  SUCCESS [
 22.215 s]
 [INFO] Spark Project GraphX ... SUCCESS [
 59.676 s]
 [INFO] Spark Project Streaming  SUCCESS [01:46
 min]
 [INFO] Spark Project Catalyst . SUCCESS [01:40
 min]
 [INFO] Spark Project SQL .. SUCCESS [03:05
 min]
 [INFO] Spark Project ML Library ... FAILURE [03:49
 min]
 [INFO] Spark Project Tools  SKIPPED
 [INFO] Spark Project Hive . SKIPPED
 [INFO] Spark Project REPL . SKIPPED
 [INFO] Spark Project Hive Thrift Server ... SKIPPED
 [INFO] Spark Project Assembly . SKIPPED
 [INFO] Spark Project External Twitter . SKIPPED
 [INFO] Spark Project External Flume Sink .. SKIPPED
 [INFO] Spark Project External Flume ... SKIPPED
 [INFO] Spark Project External MQTT  SKIPPED
 [INFO] Spark Project External ZeroMQ .. SKIPPED
 [INFO] Spark Project External Kafka ... SKIPPED
 [INFO] Spark Project Examples . SKIPPED
 [INFO] Spark Project External Kafka Assembly .. SKIPPED
 [INFO]
 
 [INFO] BUILD FAILURE
 [INFO]
 
 [INFO] Total time: 16:58 min
 [INFO] Finished at: 2015-03-17T11:04:40+03:00
 [INFO] Final Memory: 77M/1840M
 [INFO]
 
 [ERROR] Failed to execute goal
 org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project
 spark-mllib_2.10: Failed during scalastyle exe
 p 1]
 [ERROR]
 [ERROR] To see the full stack trace of the errors, re-run Maven with the
 -e switch.
 [ERROR] Re-run Maven using the -X switch to enable full debug logging.
 [ERROR]
 [ERROR] For more information about the errors and possible solutions,
 please read the following articles:
 [ERROR] [Help 1]
 http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
 [ERROR]
 [ERROR] After correcting the problems, you can resume the build with the
 command
 [ERROR]   mvn goals -rf :spark-mllib_2.10





Re: Building Spark on Windows WAS: Any IRC channel on Spark?

2015-03-17 Thread Ahmed Nawar
Scalastyle violation(s).
at
org.scalastyle.maven.plugin.ScalastyleViolationCheckMojo.performCheck(ScalastyleViolationCheckMojo.java:230)
... 22 more
[ERROR]
[ERROR]
[ERROR] For more information about the errors and possible solutions,
please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the
command
[ERROR]   mvn goals -rf :spark-mllib_2.10
C:\Nawwar\Hadoop\spark\spark-1.3.0mvn -X -Phive -Phive-thriftserver
-DskipTests clean package





On Tue, Mar 17, 2015 at 12:14 PM, Ted Yu yuzhih...@gmail.com wrote:

 Have you tried with -X switch ?

 Thanks



 On Mar 17, 2015, at 1:47 AM, Ahmed Nawar ahmed.na...@gmail.com wrote:

 Dears,

 Is there any instructions to build spark 1.3.0 on windows 7.

 I tried mvn -Phive -Phive-thriftserver -DskipTests clean package but
 i got below errors


 [INFO] Spark Project Parent POM ... SUCCESS [
  7.845 s]
 [INFO] Spark Project Networking ... SUCCESS [
 26.209 s]
 [INFO] Spark Project Shuffle Streaming Service  SUCCESS [
  9.701 s]
 [INFO] Spark Project Core . SUCCESS [04:29
 min]
 [INFO] Spark Project Bagel  SUCCESS [
 22.215 s]
 [INFO] Spark Project GraphX ... SUCCESS [
 59.676 s]
 [INFO] Spark Project Streaming  SUCCESS [01:46
 min]
 [INFO] Spark Project Catalyst . SUCCESS [01:40
 min]
 [INFO] Spark Project SQL .. SUCCESS [03:05
 min]
 [INFO] Spark Project ML Library ... FAILURE [03:49
 min]
 [INFO] Spark Project Tools  SKIPPED
 [INFO] Spark Project Hive . SKIPPED
 [INFO] Spark Project REPL . SKIPPED
 [INFO] Spark Project Hive Thrift Server ... SKIPPED
 [INFO] Spark Project Assembly . SKIPPED
 [INFO] Spark Project External Twitter . SKIPPED
 [INFO] Spark Project External Flume Sink .. SKIPPED
 [INFO] Spark Project External Flume ... SKIPPED
 [INFO] Spark Project External MQTT  SKIPPED
 [INFO] Spark Project External ZeroMQ .. SKIPPED
 [INFO] Spark Project External Kafka ... SKIPPED
 [INFO] Spark Project Examples . SKIPPED
 [INFO] Spark Project External Kafka Assembly .. SKIPPED
 [INFO]
 
 [INFO] BUILD FAILURE
 [INFO]
 
 [INFO] Total time: 16:58 min
 [INFO] Finished at: 2015-03-17T11:04:40+03:00
 [INFO] Final Memory: 77M/1840M
 [INFO]
 
 [ERROR] Failed to execute goal
 org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project
 spark-mllib_2.10: Failed during scalastyle exe
 p 1]
 [ERROR]
 [ERROR] To see the full stack trace of the errors, re-run Maven with the
 -e switch.
 [ERROR] Re-run Maven using the -X switch to enable full debug logging.
 [ERROR]
 [ERROR] For more information about the errors and possible solutions,
 please read the following articles:
 [ERROR] [Help 1]
 http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
 [ERROR]
 [ERROR] After correcting the problems, you can resume the build with the
 command
 [ERROR]   mvn goals -rf :spark-mllib_2.10








 On Tue, Mar 17, 2015 at 10:06 AM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 There's one on Freenode, You can join #Apache-Spark There's like 60
 people idling. :)

 Thanks
 Best Regards

 On Mon, Mar 16, 2015 at 10:46 PM, Feng Lin lfliu.x...@gmail.com wrote:

 Hi, everyone,
 I'm wondering whether there is a possibility to setup an official IRC
 channel on freenode.

 I noticed that a lot of apache projects would have a such channel to let
 people talk directly.

 Best
 Michael






Re: Building Spark on Windows WAS: Any IRC channel on Spark?

2015-03-17 Thread Ahmed Nawar
Dear Yu,

   Are you mean scalastyle-output.xml? i coped its content below


?xml version=1.0 encoding=UTF-8?
checkstyle version=5.0
 file
name=C:\Nawwar\Hadoop\spark\spark-1.3.0\mllib\src\main\scala\org\apache\spark\mllib\clustering\LDAModel.scala
  error severity=error message=Input length = 1/error
 /file
/checkstyle



On Tue, Mar 17, 2015 at 4:11 PM, Ted Yu yuzhih...@gmail.com wrote:

 Can you look in build output for scalastyle warning in mllib module ?

 Cheers



 On Mar 17, 2015, at 3:00 AM, Ahmed Nawar ahmed.na...@gmail.com wrote:

 Dear Yu,

With -X i got below error.


 [INFO]
 
 [INFO] Reactor Summary:
 [INFO]
 [INFO] Spark Project Parent POM ... SUCCESS [
  7.418 s]
 [INFO] Spark Project Networking ... SUCCESS [
 16.551 s]
 [INFO] Spark Project Shuffle Streaming Service  SUCCESS [
 10.392 s]
 [INFO] Spark Project Core . SUCCESS [04:26
 min]
 [INFO] Spark Project Bagel  SUCCESS [
 23.876 s]
 [INFO] Spark Project GraphX ... SUCCESS [01:02
 min]
 [INFO] Spark Project Streaming  SUCCESS [01:46
 min]
 [INFO] Spark Project Catalyst . SUCCESS [01:45
 min]
 [INFO] Spark Project SQL .. SUCCESS [02:16
 min]
 [INFO] Spark Project ML Library ... FAILURE [02:38
 min]
 [INFO] Spark Project Tools  SKIPPED
 [INFO] Spark Project Hive . SKIPPED
 [INFO] Spark Project REPL . SKIPPED
 [INFO] Spark Project Hive Thrift Server ... SKIPPED
 [INFO] Spark Project Assembly . SKIPPED
 [INFO] Spark Project External Twitter . SKIPPED
 [INFO] Spark Project External Flume Sink .. SKIPPED
 [INFO] Spark Project External Flume ... SKIPPED
 [INFO] Spark Project External MQTT  SKIPPED
 [INFO] Spark Project External ZeroMQ .. SKIPPED
 [INFO] Spark Project External Kafka ... SKIPPED
 [INFO] Spark Project Examples . SKIPPED
 [INFO] Spark Project External Kafka Assembly .. SKIPPED
 [INFO]
 
 [INFO] BUILD FAILURE
 [INFO]
 
 [INFO] Total time: 14:54 min
 [INFO] Finished at: 2015-03-17T12:54:19+03:00
 [INFO] Final Memory: 76M/1702M
 [INFO]
 
 [ERROR] Failed to execute goal
 org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project
 spark-mllib_2.10: Failed during scalastyle execution: You have 1 Scalastyle
 violation(s). - [Hel
 p 1]
 org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute
 goal org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on
 project spark-mllib_2.10: Failed during scalastyle execut
 ion
 at
 org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:216)
 at
 org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
 at
 org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
 at
 org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
 at
 org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
 at
 org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
 at
 org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
 at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
 at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
 at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
 at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
 at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at
 org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
 at
 org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
 at
 org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
 at
 org.codehaus.plexus.classworlds.launcher.Launcher.main