Re: Issues with HDP 2.4.0.2.1.3.0-563

2014-08-04 Thread Sean Owen
For any Hadoop 2.4 distro, yes, set hadoop.version but also set
-Phadoop-2.4. http://spark.apache.org/docs/latest/building-with-maven.html

On Mon, Aug 4, 2014 at 9:15 AM, Patrick Wendell pwend...@gmail.com wrote:
 For hortonworks, I believe it should work to just link against the
 corresponding upstream version. I.e. just set the Hadoop version to 2.4.0

 Does that work?

 - Patrick


 On Mon, Aug 4, 2014 at 12:13 AM, Ron's Yahoo! zlgonza...@yahoo.com.invalid
 wrote:

 Hi,
   Not sure whose issue this is, but if I run make-distribution using HDP
 2.4.0.2.1.3.0-563 as the hadoop version (replacing it in
 make-distribution.sh), I get a strange error with the exception below. If I
 use a slightly older version of HDP (2.4.0.2.1.2.0-402) with
 make-distribution, using the generated assembly all works fine for me.
 Either 1.0.0 or 1.0.1 will work fine.

   Should I file a JIRA or is this a known issue?

 Thanks,
 Ron

 Exception in thread main org.apache.spark.SparkException: Job aborted
 due to stage failure: Task 0.0:0 failed 1 times, most recent failure:
 Exception failure in TID 0 on host localhost:
 java.lang.IncompatibleClassChangeError: Found interface
 org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

 org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)

 org.apache.spark.rdd.NewHadoopRDD$$anon$1.init(NewHadoopRDD.scala:111)
 org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:99)
 org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:61)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
 org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
 org.apache.spark.scheduler.Task.run(Task.scala:51)

 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:745)



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Issues with HDP 2.4.0.2.1.3.0-563

2014-08-04 Thread Patrick Wendell
Can you try building without any of the special `hadoop.version` flags and
just building only with -Phadoop-2.4? In the past users have reported
issues trying to build random spot versions... I think HW is supposed to be
compatible with the normal 2.4.0 build.


On Mon, Aug 4, 2014 at 8:35 AM, Ron's Yahoo! zlgonza...@yahoo.com.invalid
wrote:

 Thanks, I ensured that $SPARK_HOME/pom.xml had the HDP repository under
 the repositories element. I also confirmed that if the build couldn't find
 the version, it would fail fast so it seems as if it's able to get the
 versions it needs to build the distribution.
 I ran the following (generated from make-distribution.sh), but it did not
 address the problem, while building with an older version
 (2.4.0.2.1.2.0-402) worked. Any other thing I can try?

 mvn clean package -Phadoop-2.4 -Phive -Pyarn
 -Dyarn.version=2.4.0.2.1.2.0-563 -Dhadoop.version=2.4.0.2.1.3.0-563
 -DskipTests


 Thanks,
 Ron


 On Aug 4, 2014, at 7:13 AM, Steve Nunez snu...@hortonworks.com wrote:

 Provided you¹ve got the HWX repo in your pom.xml, you can build with this
 line:

 mvn -Pyarn -Phive -Phadoop-2.4 -Dhadoop.version=2.4.0.2.1.1.0-385
 -DskipTests clean package

 I haven¹t tried building a distro, but it should be similar.


 - SteveN

 On 8/4/14, 1:25, Sean Owen so...@cloudera.com wrote:

 For any Hadoop 2.4 distro, yes, set hadoop.version but also set
 -Phadoop-2.4. http://spark.apache.org/docs/latest/building-with-maven.html

 On Mon, Aug 4, 2014 at 9:15 AM, Patrick Wendell pwend...@gmail.com
 wrote:

 For hortonworks, I believe it should work to just link against the
 corresponding upstream version. I.e. just set the Hadoop version to
 2.4.0

 Does that work?

 - Patrick


 On Mon, Aug 4, 2014 at 12:13 AM, Ron's Yahoo!
 zlgonza...@yahoo.com.invalid
 wrote:


 Hi,
  Not sure whose issue this is, but if I run make-distribution using
 HDP
 2.4.0.2.1.3.0-563 as the hadoop version (replacing it in
 make-distribution.sh), I get a strange error with the exception below.
 If I
 use a slightly older version of HDP (2.4.0.2.1.2.0-402) with
 make-distribution, using the generated assembly all works fine for me.
 Either 1.0.0 or 1.0.1 will work fine.

  Should I file a JIRA or is this a known issue?

 Thanks,
 Ron

 Exception in thread main org.apache.spark.SparkException: Job aborted
 due to stage failure: Task 0.0:0 failed 1 times, most recent failure:
 Exception failure in TID 0 on host localhost:
 java.lang.IncompatibleClassChangeError: Found interface
 org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected


 org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyI
 nputFormat.java:47)


 org.apache.spark.rdd.NewHadoopRDD$$anon$1.init(NewHadoopRDD.scala:111)

 org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:99)

 org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:61)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

 org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
org.apache.spark.scheduler.Task.run(Task.scala:51)

 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)


 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jav
 a:1145)


 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
 va:615)
java.lang.Thread.run(Thread.java:745)




 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to

 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that

 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately

 and delete it from your system. Thank You.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Re: Issues with HDP 2.4.0.2.1.3.0-563

2014-08-04 Thread Patrick Wendell
Ah I see, yeah you might need to set hadoop.version and yarn.version. I
thought he profile set this automatically.


On Mon, Aug 4, 2014 at 10:02 AM, Ron's Yahoo! zlgonza...@yahoo.com wrote:

 I meant yarn and hadoop defaulted to 1.0.4 so the yarn build fails since
 1.0.4 doesn't exist for yarn...

 Thanks,
 Ron

 On Aug 4, 2014, at 10:01 AM, Ron's Yahoo! zlgonza...@yahoo.com wrote:

 That failed since it defaulted the versions for yarn and hadoop
 I'll give it a try with just 2.4.0 for both yarn and hadoop...

 Thanks,
 Ron

 On Aug 4, 2014, at 9:44 AM, Patrick Wendell pwend...@gmail.com wrote:

 Can you try building without any of the special `hadoop.version` flags and
 just building only with -Phadoop-2.4? In the past users have reported
 issues trying to build random spot versions... I think HW is supposed to be
 compatible with the normal 2.4.0 build.


 On Mon, Aug 4, 2014 at 8:35 AM, Ron's Yahoo! zlgonza...@yahoo.com.invalid
  wrote:

 Thanks, I ensured that $SPARK_HOME/pom.xml had the HDP repository under
 the repositories element. I also confirmed that if the build couldn't find
 the version, it would fail fast so it seems as if it's able to get the
 versions it needs to build the distribution.
 I ran the following (generated from make-distribution.sh), but it did not
 address the problem, while building with an older version
 (2.4.0.2.1.2.0-402) worked. Any other thing I can try?

 mvn clean package -Phadoop-2.4 -Phive -Pyarn
 -Dyarn.version=2.4.0.2.1.2.0-563 -Dhadoop.version=2.4.0.2.1.3.0-563
 -DskipTests


 Thanks,
 Ron


 On Aug 4, 2014, at 7:13 AM, Steve Nunez snu...@hortonworks.com wrote:

 Provided you¹ve got the HWX repo in your pom.xml, you can build with this
 line:

 mvn -Pyarn -Phive -Phadoop-2.4 -Dhadoop.version=2.4.0.2.1.1.0-385
 -DskipTests clean package

 I haven¹t tried building a distro, but it should be similar.


 - SteveN

 On 8/4/14, 1:25, Sean Owen so...@cloudera.com wrote:

 For any Hadoop 2.4 distro, yes, set hadoop.version but also set
 -Phadoop-2.4.
 http://spark.apache.org/docs/latest/building-with-maven.html

 On Mon, Aug 4, 2014 at 9:15 AM, Patrick Wendell pwend...@gmail.com
 wrote:

 For hortonworks, I believe it should work to just link against the
 corresponding upstream version. I.e. just set the Hadoop version to
 2.4.0

 Does that work?

 - Patrick


 On Mon, Aug 4, 2014 at 12:13 AM, Ron's Yahoo!
 zlgonza...@yahoo.com.invalid
 wrote:


 Hi,
  Not sure whose issue this is, but if I run make-distribution using
 HDP
 2.4.0.2.1.3.0-563 as the hadoop version (replacing it in
 make-distribution.sh), I get a strange error with the exception below.
 If I
 use a slightly older version of HDP (2.4.0.2.1.2.0-402) with
 make-distribution, using the generated assembly all works fine for me.
 Either 1.0.0 or 1.0.1 will work fine.

  Should I file a JIRA or is this a known issue?

 Thanks,
 Ron

 Exception in thread main org.apache.spark.SparkException: Job aborted
 due to stage failure: Task 0.0:0 failed 1 times, most recent failure:
 Exception failure in TID 0 on host localhost:
 java.lang.IncompatibleClassChangeError: Found interface
 org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected


 org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyI
 nputFormat.java:47)


 org.apache.spark.rdd.NewHadoopRDD$$anon$1.init(NewHadoopRDD.scala:111)

 org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:99)

 org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:61)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

 org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
org.apache.spark.scheduler.Task.run(Task.scala:51)

 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)


 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jav
 a:1145)


 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
 va:615)
java.lang.Thread.run(Thread.java:745)




 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader

 of this message is not the intended 

Re: Issues with HDP 2.4.0.2.1.3.0-563

2014-08-04 Thread Steve Nunez
I don’t think there is an hwx profile, but there probably should be.

- Steve

From:  Patrick Wendell pwend...@gmail.com
Date:  Monday, August 4, 2014 at 10:08
To:  Ron's Yahoo! zlgonza...@yahoo.com
Cc:  Ron's Yahoo! zlgonza...@yahoo.com.invalid, Steve Nunez
snu...@hortonworks.com, u...@spark.apache.org, dev@spark.apache.org
dev@spark.apache.org
Subject:  Re: Issues with HDP 2.4.0.2.1.3.0-563

Ah I see, yeah you might need to set hadoop.version and yarn.version. I
thought he profile set this automatically.


On Mon, Aug 4, 2014 at 10:02 AM, Ron's Yahoo! zlgonza...@yahoo.com wrote:
 I meant yarn and hadoop defaulted to 1.0.4 so the yarn build fails since 1.0.4
 doesn’t exist for yarn...
 
 Thanks,
 Ron
 
 On Aug 4, 2014, at 10:01 AM, Ron's Yahoo! zlgonza...@yahoo.com wrote:
 
 That failed since it defaulted the versions for yarn and hadoop
 I’ll give it a try with just 2.4.0 for both yarn and hadoop…
 
 Thanks,
 Ron
 
 On Aug 4, 2014, at 9:44 AM, Patrick Wendell pwend...@gmail.com wrote:
 
 Can you try building without any of the special `hadoop.version` flags and
 just building only with -Phadoop-2.4? In the past users have reported issues
 trying to build random spot versions... I think HW is supposed to be
 compatible with the normal 2.4.0 build.
 
 
 On Mon, Aug 4, 2014 at 8:35 AM, Ron's Yahoo! zlgonza...@yahoo.com.invalid
 wrote:
 Thanks, I ensured that $SPARK_HOME/pom.xml had the HDP repository under the
 repositories element. I also confirmed that if the build couldn’t find the
 version, it would fail fast so it seems as if it’s able to get the versions
 it needs to build the distribution.
 I ran the following (generated from make-distribution.sh), but it did not
 address the problem, while building with an older version
 (2.4.0.2.1.2.0-402) worked. Any other thing I can try?
 
 mvn clean package -Phadoop-2.4 -Phive -Pyarn
 -Dyarn.version=2.4.0.2.1.2.0-563 -Dhadoop.version=2.4.0.2.1.3.0-563
 -DskipTests
 
 
 Thanks,
 Ron
 
 
 On Aug 4, 2014, at 7:13 AM, Steve Nunez snu...@hortonworks.com wrote:
 
 Provided you¹ve got the HWX repo in your pom.xml, you can build with this
 line:
 
 mvn -Pyarn -Phive -Phadoop-2.4 -Dhadoop.version=2.4.0.2.1.1.0-385
 -DskipTests clean package
 
 I haven¹t tried building a distro, but it should be similar.
 
 
 - SteveN
 
 On 8/4/14, 1:25, Sean Owen so...@cloudera.com wrote:
 
 For any Hadoop 2.4 distro, yes, set hadoop.version but also set
 -Phadoop-2.4.
 http://spark.apache.org/docs/latest/building-with-maven.html
 
 On Mon, Aug 4, 2014 at 9:15 AM, Patrick Wendell pwend...@gmail.com
 wrote:
 For hortonworks, I believe it should work to just link against the
 corresponding upstream version. I.e. just set the Hadoop version to
 2.4.0
 
 Does that work?
 
 - Patrick
 
 
 On Mon, Aug 4, 2014 at 12:13 AM, Ron's Yahoo!
 zlgonza...@yahoo.com.invalid
 wrote:
 
 Hi,
  Not sure whose issue this is, but if I run make-distribution using
 HDP
 2.4.0.2.1.3.0-563 as the hadoop version (replacing it in
 make-distribution.sh), I get a strange error with the exception below.
 If I
 use a slightly older version of HDP (2.4.0.2.1.2.0-402) with
 make-distribution, using the generated assembly all works fine for me.
 Either 1.0.0 or 1.0.1 will work fine.
 
  Should I file a JIRA or is this a known issue?
 
 Thanks,
 Ron
 
 Exception in thread main org.apache.spark.SparkException: Job aborted
 due to stage failure: Task 0.0:0 failed 1 times, most recent failure:
 Exception failure in TID 0 on host localhost:
 java.lang.IncompatibleClassChangeError: Found interface
 org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
 
 
 org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyI
 nputFormat.java:47)
 
 
 org.apache.spark.rdd.NewHadoopRDD$$anon$1.init(NewHadoopRDD.scala:111)
 
 org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:99)
 
 org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:61)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 
 org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
org.apache.spark.scheduler.Task.run(Task.scala:51)
 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
 
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jav
 a:1145)
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
 va:615)
java.lang.Thread.run(Thread.java:745

Re: Issues with HDP 2.4.0.2.1.3.0-563

2014-08-04 Thread Sean Owen
What would such a profile do though? In general building for a
specific vendor version means setting hadoop.verison and/or
yarn.version. Any hard-coded value is unlikely to match what a
particular user needs. Setting protobuf versions and so on is already
done by the generic profiles.

In a similar vein, I am not clear on why there's a mapr profile in the
build. Its versions are about to be out of date and won't work with
upcoming Hbase changes for example.

(Elsewhere in the build I think it wouldn't hurt to clear out
cloudera-specific profiles and releases too -- they're not in the pom
but are in the distribution script. It's the vendor's problem.)

This isn't any argument about being purist but just that I am not sure
these are things that the project can meaningfully bother with.

It makes sense to set vendor repos in the pom for convenience, and
makes sense to run smoke tests in Jenkins against particular versions.

$0.02
Sean

On Mon, Aug 4, 2014 at 6:21 PM, Steve Nunez snu...@hortonworks.com wrote:
 I don’t think there is an hwx profile, but there probably should be.


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Issues with HDP 2.4.0.2.1.3.0-563

2014-08-04 Thread Steve Nunez
Hmm. Fair enough. I hadn¹t given that answer much thought and on
reflection think you¹re right in that a profile would just be a bad hack.



On 8/4/14, 10:35, Sean Owen so...@cloudera.com wrote:

What would such a profile do though? In general building for a
specific vendor version means setting hadoop.verison and/or
yarn.version. Any hard-coded value is unlikely to match what a
particular user needs. Setting protobuf versions and so on is already
done by the generic profiles.

In a similar vein, I am not clear on why there's a mapr profile in the
build. Its versions are about to be out of date and won't work with
upcoming Hbase changes for example.

(Elsewhere in the build I think it wouldn't hurt to clear out
cloudera-specific profiles and releases too -- they're not in the pom
but are in the distribution script. It's the vendor's problem.)

This isn't any argument about being purist but just that I am not sure
these are things that the project can meaningfully bother with.

It makes sense to set vendor repos in the pom for convenience, and
makes sense to run smoke tests in Jenkins against particular versions.

$0.02
Sean

On Mon, Aug 4, 2014 at 6:21 PM, Steve Nunez snu...@hortonworks.com
wrote:
 I don¹t think there is an hwx profile, but there probably should be.




-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org