[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2015-04-27 Thread darose
Github user darose commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-96883761
  
On 04/27/2015 07:11 AM, Sean Owen wrote:
> @mag- if you're talking about what I think you are, it was a temporary 
thing that's long since gone already 
https://github.com/apache/spark/pull/629/files

I think @srowen is correct.  A while back I upgraded to use a newer 
version of Spark (and built it using the correct -Dhadoop.version= and 
-Phadoop- flags) and the problem went away.

DR




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2015-04-27 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-96643172
  
Agree but that doesn't exist in `master` anyway. Now the SBT build drives 
off the Maven build.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2015-04-27 Thread mag-
Github user mag- commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-96642739
  
Well:
`val jets3tVersion = if 
("^2\\.[3-9]+".r.findFirstIn(hadoopVersion).isDefined) "0.9.0" else "0.7.1"`
It probably should be other way round, if hadoop version is lower than 2.3 
we use 0.7.1
Also someone needs to test it with hadoop 2.6/2.7 where s3 support was 
splitted to hadoop-aws.
( I'm thinking that mvn profile approach was maybe cleaner than this 
if/else... )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2015-04-27 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-96611185
  
@mag- if you're talking about what I think you are, it was a temporary 
thing that's long since gone already 
https://github.com/apache/spark/pull/629/files


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2015-04-27 Thread mag-
Github user mag- commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-96587264
  
Are you aware that all this regexp hacks will break when hadoop changes 
version to 3.0.0?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2015-04-26 Thread LuqmanSahaf
Github user LuqmanSahaf commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-96522017
  
@darose I am facing the VerifyError you mentioned in one of the comments. 
Can you tell me how you solved that error?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-05-05 Thread CodingCat
Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-42253192
  
fixed in #629 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-05-05 Thread CodingCat
Github user CodingCat closed the pull request at:

https://github.com/apache/spark/pull/468


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-05-03 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-42113320
  
@srowen YARN version does need to be separate from hadoop version. 
Downstream consumers of our build sometimes do this. For instance, if they want 
to build against a custom HDFS distro (e.g. pivotal, IBM, or something) but 
want to link against the upstream apache YARN repo. It's not something we do in 
binaries we distribute but it would be good to support it.

Think it's fine to remove hadoop.major.version - it seems unused.

Adding fancy profile activation would also be nice, but I think that it's 
not necessary as an immediate fix. We can just say in the build doc that "you 
need special profiles for the following hadoop versions" and give a small table 
or list explaining which profiles to activate.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-05-03 Thread berngp
Github user berngp commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-42112284
  
I think in general is an edge case but there are folks still using hdfs
1.0.x with a different version of YARN, that said it is not my case.

I like what you suggested in another PR where you reused the variable value
of the hadoop.version to specify the yarn.version. Eg

$hadoop.version

Let me know if I should associate the small commits to specific PRs. Thanks
again for following up on those commits.

On Saturday, May 3, 2014, Guoqiang Li  wrote:

> @srowen  Related discussion in PR 
502
> .
> @berngp  Can you explain the reason of not
> using the same version of HDFS vs YARN ?
>
> —
> Reply to this email directly or view it on 
GitHub
> .
>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-05-03 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-42110042
  
@srowen  Related discussion in [PR 
502](https://github.com/apache/spark/pull/502).
@berngp Can you explain the reason of not using  the same version of HDFS 
vs YARN ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-05-03 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-42109781
  
@witgo Hm, is there an example that comes up repeatedly? Is it ever 
intentional, or just some accident of someone's legacy deployment?  I don't 
know of a case of this, and it wouldn't come up with a distro or any 
semi-recent release of Hadoop, but maybe someone will say this comes up with 
the 1.x / 0.23.x lines somehow?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-05-03 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-42109604
  
@srowen Not every one uses the same version of HDFS vs YARN.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-05-03 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-42102935
  
@pwendell Before I begin can I propose a refactoring of profiles that will 
make this and similar issues easy to deal with? Probably it's for a different 
PR, but will probably make this and similar changes easy.

We need profiles to deal with this. Profiles can be triggered explicitly 
(e.g. `-Phadoop-2.3`) or by property values (`-Dhadoop.version=2.3.0`). It's 
necessary to have things like `hadoop.version` be customizable, so it would be 
nice to also trigger needed profiles from this. However, Maven lacks ability to 
trigger on a range of property values; you can trigger on a particular value 
like "2.3.0" but not "2.3.*" or "[2.3.0,2.4.0)" syntax.

So it seems necessary to use a series of named profiles. Those profiles can 
set default version values, and those versions can be overridden. For example, 
it's nice to have a `hadoop-2.3` profile set `hadoop.version=2.3.0` for you, 
even though that can still be overridden.

(The SBT build can shadow these changes.)

After reading over the build and docs, I propose the following:

- Introduce a `hadoop-2.3` profile, similar to `hadoop-0.23`, to encompass 
2.3+-specific build changes, and one for `hadoop-2.2` as well (see later)
- `hadoop.major.version` appears to be unused -- remove it?
- I believe `yarn.version` can be removed; use `hadoop.version` in its 
place. Ideally these are always synced, no? All doc examples show 
`yarn.version` matching `hadoop.version` and the distribution script uses 
`SPARK_HADOOP_VERSION` for `yarn.version`. Now, the default Hadoop version is 
1.0.4 and there is no such YARN version. But the `yarn-alpha` profile sets 
`hadoop.version=0.23.7` to match the default `yarn.version=0.23.7` anyway. It 
seems like Hadoop 1.x + YARN is not intended anyway, which seems corroborated 
by the build documentation. 
- So, YARN-related profiles should not set `hadoop.version`, and in fact 
only serve to add the `yarn` child module

... and then the fix for this issue is trivial.

All of the build permutations listed in the documentation work under this 
new arrangement. Anyone want to see a PR or have objections?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-05-02 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-42096201
  
@srowen if you'd like to take a crack at this by the way, please do. I'll 
probably look at it on Sunday if no one else has.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-05-02 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-42096004
  
@srowen I'd prefer not to remove it from the dependency graph if possible 
because it will break local builds. The best solution I see is to add a profile 
for Hadoop 2.3 and 2.4. For now I'd be fine to just require users to manually 
trigger it and document this in `building-with-maven`. In SBT we can actually 
just insert logic in the build based on the Hadoop profile. I'm guessing we'll 
have to get into the habit of doing this, since it seems like Spark is good at 
finding bugs in Hadoop's dependency graph. We should probably start testing 
Spark against Hadoop RC's if they publish them to maven so we can give feedback.

I don't quite understand why the hadoop-client library doesn't advertise 
jets3 specifically... if I write a Java application that opens an S3 FileSystem 
and reads and writes data, don't I need jets3 to do that (i.e if this is 
outside a MapReduce job)? Is this just a bug hadoop's dependencies?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-05-02 Thread darose
Github user darose commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-42070116
  
I think I'm going to have to give up on getting Shark working on my 
existing CDH5 cluster right now.  I've tried everything I can think of (various 
binary releases, building both spark and shark myself against jets3t 0.9, 
various config tweaks, etc.) but I'm stuck at either the class not found error 
in https://issues.apache.org/jira/browse/SPARK-1556, or the verify error above. 
 I'll have to either wait until there's a new binary release, or look for an 
alternative.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-05-02 Thread darose
Github user darose commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-42027309
  
Man oh man, I cannot get this to work no way no how.  I tried rebuilding 
spark using the jets3t 0.9 jar, then tried rebuilding shark doing the same.  I 
keep getting a verify error - presumably because something in the call stack 
isn't compatible with the new jets3t version.  Anyone have any 
ideas/suggestions?  I'm at my wits' end on this.  Spent days, and still unable 
to get a working version of spark/shark running with CDH5.  Output below.

```
14/05/02 06:34:14 WARN scheduler.TaskSetManager: Loss was due to 
java.lang.VerifyError
java.lang.VerifyError: Bad type on operand stack
Exception Details:
  Location:

org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore.initialize(Ljava/net/URI;Lorg/apache/hadoop/conf/Configuration;)V
 @38: invokespecial
  Reason:
Type 'org/jets3t/service/security/AWSCredentials' (current frame, 
stack[3]) is not assignable to 'org/jets3t/service/security/ProviderCredentials
'
  Current Frame:
bci: @38
flags: { }
locals: { 'org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore', 
'java/net/URI', 'org/apache/hadoop/conf/Configuration', 'org/apache/hadoop
/fs/s3/S3Credentials', 'org/jets3t/service/security/AWSCredentials' }
stack: { 'org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore', 
uninitialized 32, uninitialized 32, 'org/jets3t/service/security/AWSCredent
ials' }
  Bytecode:
000: bb00 0259 b700 034e 2d2b 2cb6 0004 bb00
010: 0559 2db6 0006 2db6 0007 b700 083a 042a
020: bb00 0959 1904 b700 0ab5 000b a700 0b3a
030: 042a 1904 b700 0d2a 2c12 0e03 b600 0fb5
040: 0010 2a2c 1211 1400 12b6 0014 1400 15b8
050: 0017 b500 182a 2c12 1914 0015 b600 1414
060: 0015 b800 17b5 001a 2abb 001b 592b b600
070: 1cb7 001d b500 1eb1
  Exception Handler Table:
bci [14, 44] => handler: 47
  Stackmap Table:   
   [344/1956]

full_frame(@47,{Object[#176],Object[#177],Object[#178],Object[#179]},{Object[#180]})
same_frame(@55)

at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:280)
at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:270)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at 
org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:107)
at 
org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at 
org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:156)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
at org.apache.spark.scheduler.Task.run(Task.scala:53)
at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
at java.sec

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-30 Thread darose
Github user darose commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41841626
  
FYI - I think I might have figured out why deleting the jets3t jar didn't 
fix the issue.  It looks like the spark build process bundles the jets3t 
classes into the spark assembly jar.  So I'm guessing that whacking the 
stand-alone jar file wouldn't fix the issue if there's still 0.7 classes 
bundled in another jar.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-30 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41831614
  
@CodingCat I can make a patch, but it will mean introducing a new profile 
like "hadoop230" that one has to enable when building for Hadoop 2.3.0. I 
always hate to add that complexity and hope someone has a better idea. But I'll 
propose the PR if a committer nods and says it's worth changing. 

I imagine it won't be the last time the dependencies have to be fudged by 
Hadoop version -- isn't this already an existing issue with Avro anyway?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-30 Thread darose
Github user darose commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41808390
  
Sigh.  Was a promising idea, but no dice.  Even with the 0.7 jars out of 
the way, I'm still getting java.lang.NoClassDefFoundError: 
org/jets3t/service/S3ServiceException
at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:280)
at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:270)
...
at shark.SharkCliDriver.main(SharkCliDriver.scala)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-30 Thread CodingCat
Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41804895
  
Hi, @srowen, do you want to take over the patch? I'm concerning I cannot 
fix it in the following days, considering my schedule and my knowledge level on 
mvn and sbt?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-30 Thread darose
Github user darose commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41804694
  
Definitely worth a shot!  Will give that a try and report back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-30 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41804304
  
@darose what about removing the library from the assembly entirely? so 
there is no copy in your app or in the deployed Spark jars? May not be a viable 
solution in general, but it may well work for you if it's picking up the jar 
from the Hadoop installation. Worth a shot?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-30 Thread darose
Github user darose commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41797308
  
What I can confirm is that trying to remove the jets3t 0.7 jars from the 
CDH spark-core package and replace them with 0.9 jars doesn't fix the issue.  
(I'm guessing because spark was built against the 0.7 jars.)  Results in a 
verifier error:

  Location:

org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore.initialize(Ljava/net/URI;Lorg/apache/hadoop/conf/Configuration;)V
 @38: invokespecial
  Reason:
Type 'org/jets3t/service/security/AWSCredentials' (current frame, 
stack[3]) is not assignable to 'org/jets3t/service/security/ProviderCredentials'

So what options do I have to get spark working on Hadoop 2.3 until 
SPARK-1556 gets fixed.  (And deployed to an update of CDH.)  I'm guessing my 
only recourse is to build spark from source?  (After tweaking the 
project/SparkBuild.scala file to update it to "net.java.dev.jets3t"  % 
"jets3t"   % "0.9.0")


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-30 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41770038
  
@darose this can be patched downstream, but that would not fix this for any 
other distro. Ideally, the dependency is set to 0.9.0 when built against Hadoop 
2.3.0+. As we've seen in other cases, it's possible to manage this with more 
profiles in the build -- a PITA, but certainly possible. (I don't know if this 
helps the SBT build though, but presumably some similar logic about the Hadoop 
version is possible there.)

The funny thing is that this dependency is only needed at runtime. (Really 
should be declared as runtime) I am still not sure why 
hadoop-client doesn't package it. However, I wonder if, in the context of a 
Hadoop cluster, it's going to be on the classpath anyway? and then it would be 
the right version in all cases. What if you change the scope, so jets3t is not 
even in the assembly?

I actually bet that works, and is simple. However I think it means that s3 
would no longer work if running Spark by itself, so is probably a non-starter.

So, a new hadoop2.3.0 profile? that we try to trigger based on well-known 
hadoop.version values?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-29 Thread darose
Github user darose commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41764125
  
So @srowen, I think @mateiz is right, the CDH5 spark-core package (on 
Ubuntu, it's version 0.9.0+cdh5.0.0+31-1.cdh5.0.0.p0.31~precise-cdh5.0.0) won't 
function correctly due to this issue and so would need to get rebuilt against 
jets3t 0.9.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-29 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41748394
  
@CodingCat the problem is that on worker nodes there will be the wrong 
jets3t in the Spark JAR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-29 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41748362
  
BTW the right way to do it would be to make hadoop-client have a Maven 
dependency on the right version of Jets3t. Then Spark would just build with the 
right version out of the box when it linked to the right Hadoop version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-29 Thread CodingCat
Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41747796
  
@mateiz for @darose 's question, how about compile the application against 
a customized spark jar (with newer jets3t)? I think in that case, he does not 
need to restart the cluster?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-29 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41747427
  
You can try adding jets3t 0.9 as a Maven dependency in your application, 
but unfortunately I think that goes after the Spark assembly JAR when running 
an app. In 1.0 there will be a setting to put the user's classpath first.

It sounds like the Spark bundle for CDH needs to be updated with this; 
CCing @srowen.

For this patch, we probably want to create a new Maven profile to use a new 
Jets3t when that's enabled.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-29 Thread CodingCat
Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41734924
  
I think the possible way to do that is compile a jets3t0.9.0-enabled 
version by yourself

then compile your application against this version  I think to access 
HDFS-compatible fs, we eventually call the code in application jar


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-29 Thread darose
Github user darose commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41732946
  
Is there any way to apply this fix without a rebuild of spark?  E.g., to 
just replace jets3t-0.7.1.jar with jets3t-0.9.0.jar in a deployed spark 
package?  I'm running into this issue on a machine where I have the CDH5 hadoop 
and spark packages installed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41640658
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14551/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41640655
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-28 Thread CodingCat
Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41639362
  
I recovered the build files and updated the documents to indicate this 
situation for the user


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41639352
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41639344
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-28 Thread CodingCat
Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41639298
  
@mateiz you are right, I received the exception of 
```java.lang.NoSuchMethodError: 
org.jets3t.service.impl.rest.httpclient.RestS3Service.(Lorg/jets3t/service/security/AWSCredentials;)V"
 in both 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-22 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41079837
  
Sure, that would work. Please try it. Unfortunately I remember it having 
problems, but I could be wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-22 Thread CodingCat
Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41079532
  
Hi, @mateiz @srowen , if Spark built with Hadoop 1.0.4/2.x (x < 3)  and 
jets3t 0.9.0 can access S3 smoothly, does it also mean that bumping to 0.9.0 is 
safe?
 
I'm going to give a manual test tonight or tomorrow


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-22 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41073254
  
Great, so there's no easy way to set it based on profiles and support all 
Hadoop versions :). Maybe for Hadoop 2.3+ users, we can just tell them to add a 
new version of jets3t to their own project's build? We can certainly have our 
pre-built binaries include the right one too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-22 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41064211
  
@mateiz It looks like it went to 0.8.1 in Hadoop 1.3.0 
(https://issues.apache.org/jira/browse/HADOOP-8136) and 0.9.0 in 2.3.0 
(https://issues.apache.org/jira/browse/HADOOP-9623)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-22 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41059208
  
In that case let's see exactly which Hadoop 2.x version bumped up the 
dependency, because I don't think 2.0 and 2.1 did it (could be wrong though).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-22 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41014340
  
@mateiz I thought the same thing, that `hadoop-client` pulls this in, but 
it does not. Only things like `hadoop-hdfs`.

I agree with updating the dependency, but to match the Hadoop version. So 
the 0.9.0 version belong in the Hadoop 2 profiles.

(Also it should be a runtime scope dependency in Maven.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2014-04-22 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-41009471
  
Unfortunately this will not work in older Hadoop versions as far as I know. 
Can you still build Spark against Hadoop 1.0.4 and run it with this change?

It might be better to receive jets3t from Hadoop instead of depending on it 
ourselves. I'm not sure if hadoop-client depends on it...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---