[jira] [Resolved] (HDFS-16684) Exclude self from JournalNodeSyncer when using a bind host

2022-08-28 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HDFS-16684.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

Merged to trunk and branch-3.3. Resolving. Thanks for the nice contribution 
[~svaughan] 

> Exclude self from JournalNodeSyncer when using a bind host
> --
>
> Key: HDFS-16684
> URL: https://issues.apache.org/jira/browse/HDFS-16684
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node
>Affects Versions: 3.4.0, 3.3.9
> Environment: Running with Java 11 and bind addresses set to 0.0.0.0.
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> The JournalNodeSyncer will include the local instance in syncing when using a 
> bind host (e.g. 0.0.0.0).  There is a mechanism that is supposed to exclude 
> the local instance, but it doesn't recognize the meta-address as a local 
> address.
> Running with bind addresses set to 0.0.0.0, the JournalNodeSyncer will log 
> attempts to sync with itself as part of the normal syncing rotation.  For an 
> HA configuration running 3 JournalNodes, the "other" list used by the 
> JournalNodeSyncer will include 3 proxies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.3.4

2022-08-04 Thread Stack
+1 (Sorry, took me a while)

Ran: ./dev-support/hadoop-vote.sh --source
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/

* Signature: ok

* Checksum : failed

* Rat check (17.0.1): ok

 - mvn clean apache-rat:check

* Built from source (17.0.1): ok

 - mvn clean install  -DskipTests

* Built tar from source (17.0.1): ok

 - mvn clean package  -Pdist -DskipTests -Dtar
-Dmaven.javadoc.skip=true

Took a look at website. Home page says stuff like, “ARM Support: This is
the first release to support ARM architectures.“, which I don’t think is
true of 3.3.4 but otherwise, looks fine.

Only played with HDFS. UIs looked right.

Deployed to ten node arm64 cluster. Ran the hbase verification job on top
of it and all passed. Did some kills, stuff came back.

I didn't spend time on unit tests but one set passed on a local rig here:

[image: image.png]
Stack

On Fri, Jul 29, 2022 at 11:48 AM Steve Loughran 
wrote:

> I have put together a release candidate (RC1) for Hadoop 3.3.4
>
> The RC is available at:
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/
>
> The git tag is release-3.3.4-RC1, commit a585a73c3e0
>
> The maven artifacts are staged at
> https://repository.apache.org/content/repositories/orgapachehadoop-1358/
>
> You can find my public key at:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> Change log
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/CHANGELOG.md
>
> Release notes
>
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/RELEASENOTES.md
>
> There's a very small number of changes, primarily critical code/packaging
> issues and security fixes.
>
> See the release notes for details.
>
> Please try the release and vote. The vote will run for 5 days.
>
> steve
>


[jira] [Resolved] (HDFS-16586) Purge FsDatasetAsyncDiskService threadgroup; it causes BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal exception and exit'

2022-05-25 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HDFS-16586.
--
Fix Version/s: 3.4.0
   3.2.4
   3.3.4
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to branch-3, branch-3.3, and to branch-3.2. Thank you for the review 
[~hexiaoqiao] 

> Purge FsDatasetAsyncDiskService threadgroup; it causes 
> BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal 
> exception and exit' 
> -
>
> Key: HDFS-16586
> URL: https://issues.apache.org/jira/browse/HDFS-16586
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.3.0, 3.2.3
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The below failed block finalize is causing a downstreamer's test to fail when 
> it uses hadoop 3.2.3 or 3.3.0+:
> {code:java}
> 2022-05-19T18:21:08,243 INFO  [Command processor] 
> impl.FsDatasetAsyncDiskService(234): Scheduling blk_1073741840_1016 replica 
> FinalizedReplica, blk_1073741840_1016, FINALIZED
>   getNumBytes()     = 52
>   getBytesOnDisk()  = 52
>   getVisibleLength()= 52
>   getVolume()       = 
> /Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2
>   getBlockURI()     = 
> file:/Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2/current/BP-62743752-127.0.0.1-1653009535881/current/finalized/subdir0/subdir0/blk_1073741840
>  for deletion
> 2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] 
> metrics.TopMetrics(134): a metric is reported: cmd: delete user: stack.hfs.0 
> (auth:SIMPLE)
> 2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] 
> top.TopAuditLogger(78): ------- logged event for top service: 
> allowed=true ugi=stack.hfs.0 (auth:SIMPLE) ip=/127.0.0.1 cmd=delete  
> src=/user/stack/test-data/b8167d53-bcd7-c682-a767-55faaf7f3e96/data/default/t1/4499521075f51d5138fe4f1916daf92d/.tmp
>   dst=null  perm=null
> 2022-05-19T18:21:08,243 DEBUG [PacketResponder: 
> BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, 
> type=LAST_IN_PIPELINE] datanode.BlockReceiver$PacketResponder(1645): 
> PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, 
> type=LAST_IN_PIPELINE, replyAck=seqno: 901 reply: SUCCESS 
> downstreamAckTimeNanos: 0 flag: 0
> 2022-05-19T18:21:08,243 DEBUG [PacketResponder: 
> BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, 
> type=LAST_IN_PIPELINE] datanode.BlockReceiver$PacketResponder(1327): 
> PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, 
> type=LAST_IN_PIPELINE: seqno=-2 waiting for local datanode to finish write.
> 2022-05-19T18:21:08,243 ERROR [Command processor] 
> datanode.BPServiceActor$CommandProcessingThread(1276): Command processor 
> encountered fatal exception and exit.
> java.lang.IllegalThreadStateException: null
>   at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:865) ~[?:?]
>   at java.lang.Thread.(Thread.java:430) ~[?:?]
>   at java.lang.Thread.(Thread.java:704) ~[?:?]
>   at java.lang.Thread.(Thread.java:525) ~[?:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$1.newThread(FsDatasetAsyncDiskService.java:113)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:623)
>  ~[?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:912)
>  ~[?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1343) 
> ~[?:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:189)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:238)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2184)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   a

[jira] [Created] (HDFS-16586) Purge FsDatasetAsyncDiskService threadgroup; it causes BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal exception and exit'

2022-05-20 Thread Michael Stack (Jira)
Michael Stack created HDFS-16586:


 Summary: Purge FsDatasetAsyncDiskService threadgroup; it causes 
BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal 
exception and exit' 
 Key: HDFS-16586
 URL: https://issues.apache.org/jira/browse/HDFS-16586
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.2.3, 3.3.0
Reporter: Michael Stack
Assignee: Michael Stack


The below failed block finalize is causing a downstreamer's test to fail when 
it uses hadoop 3.2.3 or 3.3.0+:
{code:java}
2022-05-19T18:21:08,243 INFO  [Command processor] 
impl.FsDatasetAsyncDiskService(234): Scheduling blk_1073741840_1016 replica 
FinalizedReplica, blk_1073741840_1016, FINALIZED
  getNumBytes()     = 52
  getBytesOnDisk()  = 52
  getVisibleLength()= 52
  getVolume()       = 
/Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2
  getBlockURI()     = 
file:/Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2/current/BP-62743752-127.0.0.1-1653009535881/current/finalized/subdir0/subdir0/blk_1073741840
 for deletion
2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] 
metrics.TopMetrics(134): a metric is reported: cmd: delete user: stack.hfs.0 
(auth:SIMPLE)
2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] 
top.TopAuditLogger(78): --- logged event for top service: 
allowed=true ugi=stack.hfs.0 (auth:SIMPLE) ip=/127.0.0.1 cmd=delete  
src=/user/stack/test-data/b8167d53-bcd7-c682-a767-55faaf7f3e96/data/default/t1/4499521075f51d5138fe4f1916daf92d/.tmp
  dst=null  perm=null
2022-05-19T18:21:08,243 DEBUG [PacketResponder: 
BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE] 
datanode.BlockReceiver$PacketResponder(1645): PacketResponder: 
BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE, 
replyAck=seqno: 901 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0
2022-05-19T18:21:08,243 DEBUG [PacketResponder: 
BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE] 
datanode.BlockReceiver$PacketResponder(1327): PacketResponder: 
BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE: 
seqno=-2 waiting for local datanode to finish write.
2022-05-19T18:21:08,243 ERROR [Command processor] 
datanode.BPServiceActor$CommandProcessingThread(1276): Command processor 
encountered fatal exception and exit.
java.lang.IllegalThreadStateException: null
  at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:865) ~[?:?]
  at java.lang.Thread.(Thread.java:430) ~[?:?]
  at java.lang.Thread.(Thread.java:704) ~[?:?]
  at java.lang.Thread.(Thread.java:525) ~[?:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$1.newThread(FsDatasetAsyncDiskService.java:113)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:623)
 ~[?:?]
  at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:912) 
~[?:?]
  at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1343) 
~[?:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:189)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:238)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2184)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2103)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:736)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:682)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1318)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1364)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1291)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1274)
 ~[hadoop-hdfs-3.2.3.jar:?]
2022-05-19T18:21:08,243 DEBUG [DataXce

[jira] [Resolved] (HDFS-16540) Data locality is lost when DataNode pod restarts in kubernetes

2022-05-15 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HDFS-16540.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

Merged to branch-3.3. and to trunk.

> Data locality is lost when DataNode pod restarts in kubernetes 
> ---
>
> Key: HDFS-16540
> URL: https://issues.apache.org/jira/browse/HDFS-16540
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.2
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.4
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> We have HBase RegionServer and Hdfs DataNode running in one pod. When the pod 
> restarts, we found that data locality is lost after we do a major compaction 
> of hbase regions. After some debugging, we found that upon pod restarts, its 
> ip changes. In DatanodeManager, maps like networktopology are updated with 
> the new info. host2DatanodeMap is not updated accordingly. When hdfs client 
> with the new ip tries to find a local DataNode, it fails. 
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.3.3 (RC1)

2022-05-12 Thread Stack
+1 (binding)

* Signature: ok
* Checksum : ok
* Rat check (10.0.2): ok
 - mvn clean apache-rat:check
* Built from source (10.0.2): ok
 - mvn clean install  -DskipTests
* Unit tests pass (10.0.2): ok
 - mvn package -P runAllTests  -Dsurefire.rerunFailingTestsCount=3


[INFO] Apache Hadoop Cloud Storage Project  SUCCESS [
 0.026 s]
[INFO]

[INFO] BUILD SUCCESS
[INFO]

[INFO] Total time:  12:51 h
[INFO] Finished at: 2022-05-12T06:25:19-07:00
[INFO]

[WARNING] The requested profile "runAllTests" could not be activated
because it does not exist.

Built a downstreamer against this RC and ran it in-the-small. Seemed fine.

S


On Wed, May 11, 2022 at 10:25 AM Steve Loughran 
wrote:

> I have put together a release candidate (RC1) for Hadoop 3.3.3
>
> The RC is available at:
> https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC1/
>
> The git tag is release-3.3.3-RC1, commit d37586cbda3
>
> The maven artifacts are staged at
> https://repository.apache.org/content/repositories/orgapachehadoop-1349/
>
> You can find my public key at:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> Change log
> https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC1/CHANGELOG.md
>
> Release notes
> https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC1/RELEASENOTES.md
>
> There's a very small number of changes, primarily critical code/packaging
> issues and security fixes.
>
> * The critical fixes which shipped in the 3.2.3 release.
> * CVEs in our code and dependencies
> * Shaded client packaging issues.
> * A switch from log4j to reload4j
>
> reload4j is an active fork of the log4j 1.17 library with the classes
> which contain CVEs removed. Even though hadoop never used those classes,
> they regularly raised alerts on security scans and concen from users.
> Switching to the forked project allows us to ship a secure logging
> framework. It will complicate the builds of downstream
> maven/ivy/gradle projects which exclude our log4j artifacts, as they
> need to cut the new dependency instead/as well.
>
> See the release notes for details.
>
> This is the second release attempt. It is the same git commit as before,
> but
> fully recompiled with another republish to maven staging, which has bee
> verified by building spark, as well as a minimal test project.
>
> Please try the release and vote. The vote will run for 5 days.
>
> -Steve
>


Re: [VOTE] Release Apache Hadoop 3.3.3

2022-05-06 Thread Stack
+1 (binding)

  * Signature: ok
  * Checksum : passed
  * Rat check (1.8.0_191): passed
   - mvn clean apache-rat:check
  * Built from source (1.8.0_191): failed
   - mvn clean install  -DskipTests
   - mvn -fae --no-transfer-progress -DskipTests -Dmaven.javadoc.skip=true
-Pnative -Drequire.openssl -Drequire.snappy -Drequire.valgrind
-Drequire.zstd -Drequire.test.libhadoop clean install
  * Unit tests pass (1.8.0_191):
- HDFS Tests passed (Didn't run more than this).

Deployed a ten node ha hdfs cluster with three namenodes and five
journalnodes. Ran a ten node hbase (older version of 2.5 branch built
against 3.3.2) against it. Tried a small verification job. Good. Ran a
bigger job with mild chaos. All seems to be working properly (recoveries,
logs look fine). Killed a namenode. Failover worked promptly. UIs look
good. Poked at the hdfs cli. Seems good.

S

On Tue, May 3, 2022 at 4:24 AM Steve Loughran 
wrote:

> I have put together a release candidate (rc0) for Hadoop 3.3.3
>
> The RC is available at:
> https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC0/
>
> The git tag is release-3.3.3-RC0, commit d37586cbda3
>
> The maven artifacts are staged at
> https://repository.apache.org/content/repositories/orgapachehadoop-1348/
>
> You can find my public key at:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> Change log
> https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC0/CHANGELOG.md
>
> Release notes
> https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC0/RELEASENOTES.md
>
> There's a very small number of changes, primarily critical code/packaging
> issues and security fixes.
>
>
>- The critical fixes which shipped in the 3.2.3 release.
>-  CVEs in our code and dependencies
>- Shaded client packaging issues.
>- A switch from log4j to reload4j
>
>
> reload4j is an active fork of the log4j 1.17 library with the classes which
> contain CVEs removed. Even though hadoop never used those classes, they
> regularly raised alerts on security scans and concen from users. Switching
> to the forked project allows us to ship a secure logging framework. It will
> complicate the builds of downstream maven/ivy/gradle projects which exclude
> our log4j artifacts, as they need to cut the new dependency instead/as
> well.
>
> See the release notes for details.
>
> This is my first release through the new docker build process, do please
> validate artifact signing &c to make sure it is good. I'll be trying builds
> of downstream projects.
>
> We know there are some outstanding issues with at least one library we are
> shipping (okhttp), but I don't want to hold this release up for it. If the
> docker based release process works smoothly enough we can do a followup
> security release in a few weeks.
>
> Please try the release and vote. The vote will run for 5 days.
>
> -Steve
>


Re: [VOTE] Release Apache Hadoop 3.3.2 - RC5

2022-02-22 Thread Stack
+1

Verified checksums, signatures, and rat-check are good.

Built (RC4) locally from source and ran a small hdfs cluster with hbase on
top. Ran an hbase upload w/ chaos and verification and hdfs seemed to do
the right thing.

S

On Mon, Feb 21, 2022 at 9:17 PM Chao Sun  wrote:

> Hi all,
>
> Here's Hadoop 3.3.2 release candidate #5:
>
> The RC is available at: http://people.apache.org/~sunchao/hadoop-3.3.2-RC5
> The RC tag is at:
> https://github.com/apache/hadoop/releases/tag/release-3.3.2-RC5
> The Maven artifacts are staged at:
> https://repository.apache.org/content/repositories/orgapachehadoop-1335
>
> You can find my public key at:
> https://downloads.apache.org/hadoop/common/KEYS
>
> CHANGELOG is the only difference between this and RC4. Therefore, the tests
> I've done in RC4 are still valid:
> - Ran all the unit tests
> - Started a single node HDFS cluster and tested a few simple commands
> - Ran all the tests in Spark using the RC5 artifacts
>
> Please evaluate the RC and vote, thanks!
>
> Best,
> Chao
>


Re: [VOTE] Release Apache Hadoop 3.3.2 - RC2

2022-01-24 Thread Stack
+1 (binding)

* Signature: ok
* Checksum : ok
* Rat check (1.8.0_191): ok
 - mvn clean apache-rat:check
* Built from source (1.8.0_191): ok
 - mvn clean install  -DskipTests

Poking around in the binary, it looks good. Unpacked site. Looks right.
Checked a few links work.

Deployed over ten node cluster. Ran HBase ITBLL over it for a few hours w/
chaos. Worked like 3.3.1...

I tried to build with 3.8.1 maven and got the below.

[ERROR] Failed to execute goal on project
hadoop-yarn-applications-catalog-webapp: Could not resolve dependencies for
project
org.apache.hadoop:hadoop-yarn-applications-catalog-webapp:war:3.3.2: Failed
to collect dependencies at org.apache.solr:solr-core:jar:7.7.0 ->
org.restlet.jee:org.restlet:jar:2.3.0: Failed to read artifact descriptor
for org.restlet.
jee:org.restlet:jar:2.3.0: Could not transfer artifact
org.restlet.jee:org.restlet:pom:2.3.0 from/to maven-default-http-blocker (
http://0.0.0.0/): Blocked mirror for repositories: [maven-restlet (
http://maven.restlet.org, default, releases+snapshots), apache.snapshots (
http://repository.apache.org/snapshots, default, disabled)] -> [Help 1]

I used 3.6.3 mvn instead (looks like a simple fix).

Thanks for packaging up this fat point release Chao Sun.

S

On Wed, Jan 19, 2022 at 9:50 AM Chao Sun  wrote:

> Hi all,
>
> I've put together Hadoop 3.3.2 RC2 below:
>
> The RC is available at:
> http://people.apache.org/~sunchao/hadoop-3.3.2-RC2/
> The RC tag is at:
> https://github.com/apache/hadoop/releases/tag/release-3.3.2-RC2
> The Maven artifacts are staged at:
> https://repository.apache.org/content/repositories/orgapachehadoop-1332
>
> You can find my public key at:
> https://downloads.apache.org/hadoop/common/KEYS
>
> I've done the following tests and they look good:
> - Ran all the unit tests
> - Started a single node HDFS cluster and tested a few simple commands
> - Ran all the tests in Spark using the RC2 artifacts
>
> Please evaluate the RC and vote, thanks!
>
> Best,
> Chao
>


Re: [VOTE] Release Apache Hadoop 3.3.1 RC3

2021-06-09 Thread Stack
+1



* Signature: ok

* Checksum : ok

* Rat check (1.8.0_191): ok

 - mvn clean apache-rat:check

* Built from source (1.8.0_191): ok

 - mvn clean install -DskipTests


Ran a ten node cluster w/ hbase on top running its verification loadings w/
(gentle) chaos. Had trouble getting the rig running but mostly pilot error
and none that I could particularly attribute to hdfs after poking in logs.

Messed in UI and shell some. Nothing untoward.

Wei-Chiu fixed broke tests over in hbase and complete runs are pretty much
there (a classic flakie seems more-so on 3.3.1... will dig in more on why).


Thanks,

S


On Tue, Jun 1, 2021 at 3:29 AM Wei-Chiu Chuang  wrote:

> Hi community,
>
> This is the release candidate RC3 of Apache Hadoop 3.3.1 line. All blocker
> issues have been resolved [1] again.
>
> There are 2 additional issues resolved for RC3:
> * Revert "MAPREDUCE-7303. Fix TestJobResourceUploader failures after
> HADOOP-16878
> * Revert "HADOOP-16878. FileUtil.copy() to throw IOException if the source
> and destination are the same
>
> There are 4 issues resolved for RC2:
> * HADOOP-17666. Update LICENSE for 3.3.1
> * MAPREDUCE-7348. TestFrameworkUploader#testNativeIO fails. (#3053)
> * Revert "HADOOP-17563. Update Bouncy Castle to 1.68. (#2740)" (#3055)
> * HADOOP-17739. Use hadoop-thirdparty 1.1.1. (#3064)
>
> The Hadoop-thirdparty 1.1.1, as previously mentioned, contains two extra
> fixes compared to hadoop-thirdparty 1.1.0:
> * HADOOP-17707. Remove jaeger document from site index.
> * HADOOP-17730. Add back error_prone
>
> *RC tag is release-3.3.1-RC3
> https://github.com/apache/hadoop/releases/tag/release-3.3.1-RC3
>
> *The RC3 artifacts are at*:
> https://home.apache.org/~weichiu/hadoop-3.3.1-RC3/
> ARM artifacts: https://home.apache.org/~weichiu/hadoop-3.3.1-RC3-arm/
>
> *The maven artifacts are hosted here:*
> https://repository.apache.org/content/repositories/orgapachehadoop-1320/
>
> *My public key is available here:*
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
>
> Things I've verified:
> * all blocker issues targeting 3.3.1 have been resolved.
> * stable/evolving API changes between 3.3.0 and 3.3.1 are compatible.
> * LICENSE and NOTICE files checked
> * RELEASENOTES and CHANGELOG
> * rat check passed.
> * Built HBase master branch on top of Hadoop 3.3.1 RC2, ran unit tests.
> * Built Ozone master on top fo Hadoop 3.3.1 RC2, ran unit tests.
> * Extra: built 50 other open source projects on top of Hadoop 3.3.1 RC2.
> Had to patch some of them due to commons-lang migration (Hadoop 3.2.0) and
> dependency divergence. Issues are being identified but so far nothing
> blocker for Hadoop itself.
>
> Please try the release and vote. The vote will run for 5 days.
>
> My +1 to start,
>
> [1] https://issues.apache.org/jira/issues/?filter=12350491
> [2]
>
> https://github.com/apache/hadoop/compare/release-3.3.1-RC1...release-3.3.1-RC3
>


Re: [VOTE] hadoop-thirdparty 1.1.0-RC0

2021-05-13 Thread Stack
+1

* I verified src tgz is signed with the key from
https://people.apache.org/keys/committer/weichiu.asc
* Verified hash.
* Built from src w/ -Prelease profile
* Checked CHANGES against git log.

S




On Thu, May 13, 2021 at 12:55 PM Wei-Chiu Chuang  wrote:

> Hello my fellow Hadoop developers,
>
> I am putting together the first release candidate (RC0) for
> Hadoop-thirdparty 1.1.0. This is going to be consumed by the upcoming
> Hadoop 3.3.1 release.
>
> The RC is available at:
> https://people.apache.org/~weichiu/hadoop-thirdparty-1.1.0-RC0/
> The RC tag in github is here:
> https://github.com/apache/hadoop-thirdparty/tree/release-1.1.0-RC0
> The maven artifacts are staged at:
> https://repository.apache.org/content/repositories/orgapachehadoop-1309/
>
> You can find my public key at:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS or
> https://people.apache.org/keys/committer/weichiu.asc
>
>
> Please try the release and vote. The vote will run for 5 days until
> 2021/05/19 at 00:00 CST.
>
> Note: Our post commit automation builds the code, and pushes the SNAPSHOT
> artifacts to central Maven, which is consumed by Hadoop trunk and
> branch-3.3, so it is a good validation that things are working properly in
> hadoop-thirdparty.
>
> Thanks,
> Wei-Chiu
>


Re: [DISCUSS] Hadoop 3.3.1 release

2021-02-08 Thread Stack
On Wed, Feb 3, 2021 at 6:41 AM Steve Loughran 
wrote:

>
> Regarding blockers &c: how about we have a little hackathon where we try
> and get things in. This means a promise of review time from the people with
> commit rights and other people who understand the code (Stack?)
>
>

I'm up for helping get 3.3.1 out (reviewing, hackathon, testing).
Thanks,
S




> -steve
>
> On Thu, 28 Jan 2021 at 06:48, Ayush Saxena  wrote:
>
> > +1
> > Just to mention we would need to release hadoop-thirdparty too before.
> > Presently we are using the snapshot version of it.
> >
> > -Ayush
> >
> > > On 28-Jan-2021, at 6:59 AM, Wei-Chiu Chuang 
> wrote:
> > >
> > > Hi all,
> > >
> > > Hadoop 3.3.0 was released half a year ago, and as of now we've
> > accumulated
> > > more than 400 changes in the branch-3.3. A number of downstreamers are
> > > eagerly waiting for 3.3.1 which addresses the guava version conflict
> > issue.
> > >
> > >
> >
> https://issues.apache.org/jira/issues/?filter=-1&jql=project%20in%20(HDFS%2C%20HADOOP%2C%20YARN%2C%20MAPREDUCE)%20and%20fixVersion%20in%20(3.3.1)%20and%20status%20%3D%20Resolved%20
> > >
> > > We should start the release work for 3.3.1 before the diff becomes even
> > > larger.
> > >
> > > I believe there are  currently only two real blockers for a 3.3.1
> (using
> > > this filter
> > >
> >
> https://issues.apache.org/jira/issues/?filter=-1&jql=project%20in%20(HDFS%2C%20HADOOP%2C%20YARN%2C%20MAPREDUCE)%20AND%20cf%5B12310320%5D%20in%20(3.3.1)%20AND%20status%20not%20in%20(Resolved)%20ORDER%20BY%20priority%20DESC
> > > )
> > >
> > >
> > >   1. HDFS-15566 <https://issues.apache.org/jira/browse/HDFS-15566>
> > >   2.
> > >  1. HADOOP-17112 <
> https://issues.apache.org/jira/browse/HADOOP-17112
> > >
> > > 2.
> > >
> > >
> > >
> > > Is there anyone who would volunteer to be the 3.3.1 RM?
> > >
> > > Also, the HowToRelease wiki does not describe the ARM build process.
> > That's
> > > going to be important for future releases.
> >
> > -
> > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> >
> >
>


Re: [DISCUSS] Hadoop 3.3.1 release

2021-01-27 Thread Stack
Thanks for bringing up the topic Wei-Chiu. +1 on a 3.3.1 soon.

Was going to spend time testing

Yours,
S

On Wed, Jan 27, 2021 at 5:28 PM Wei-Chiu Chuang  wrote:

> Hi all,
>
> Hadoop 3.3.0 was released half a year ago, and as of now we've accumulated
> more than 400 changes in the branch-3.3. A number of downstreamers are
> eagerly waiting for 3.3.1 which addresses the guava version conflict issue.
>
>
> https://issues.apache.org/jira/issues/?filter=-1&jql=project%20in%20(HDFS%2C%20HADOOP%2C%20YARN%2C%20MAPREDUCE)%20and%20fixVersion%20in%20(3.3.1)%20and%20status%20%3D%20Resolved%20
>
> We should start the release work for 3.3.1 before the diff becomes even
> larger.
>
> I believe there are  currently only two real blockers for a 3.3.1 (using
> this filter
>
> https://issues.apache.org/jira/issues/?filter=-1&jql=project%20in%20(HDFS%2C%20HADOOP%2C%20YARN%2C%20MAPREDUCE)%20AND%20cf%5B12310320%5D%20in%20(3.3.1)%20AND%20status%20not%20in%20(Resolved)%20ORDER%20BY%20priority%20DESC
> )
>
>
>1. HDFS-15566 
>2.
>   1. HADOOP-17112 
>  2.
>
>
>
> Is there anyone who would volunteer to be the 3.3.1 RM?
>
> Also, the HowToRelease wiki does not describe the ARM build process. That's
> going to be important for future releases.
>


[jira] [Resolved] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9

2019-07-01 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HDFS-14585.
--
Resolution: Fixed

Reapplied w/ proper commit message. Re-resolving.

> Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9
> -
>
> Key: HDFS-14585
> URL: https://issues.apache.org/jira/browse/HDFS-14585
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 2.10.0, 2.9.3
>
> Attachments: HDFS-14585.branch-2.9.v1.patch, 
> HDFS-14585.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v2.patch, 
> HDFS-14585.branch-2.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9

2019-07-01 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HDFS-14585:
--

Reopening. Commit message was missing the JIRA # so revert and reapply with 
fixed commit message.

> Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9
> -
>
> Key: HDFS-14585
> URL: https://issues.apache.org/jira/browse/HDFS-14585
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 2.10.0, 2.9.3
>
> Attachments: HDFS-14585.branch-2.9.v1.patch, 
> HDFS-14585.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v2.patch, 
> HDFS-14585.branch-2.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: Please grant JIRA contributor permission to stakiar and openinx

2019-02-18 Thread Stack
I added you fellows to hadoop common and to hadoop hdfs. Shout if it don't
work Zheng Hu.
S

On Mon, Feb 18, 2019 at 7:08 PM OpenInx  wrote:

> Dear hdfs-dev:
>
>stakiar has been working on this issue:
> https://issues.apache.org/jira/browse/HDFS-3246, but he
>has no permission to attach his patch and run hadoop QA.
>
> And I'm working on HBASE-21879,  which depends on the ByteBuffer pread
> interface, and I think
> it'll be an great p999-latency improvement for 100% get/scan case in
> HBase.
>
>Could anyone help to grant the JIRA contributor permission to us ?   so
> we can move this task
>as faster as possible :-)
>Our JIRA id are:  stakiar / openinx
>
>Thanks.
>


Re: [DISCUSSION] Create a branch to work on non-blocking access to HDFS

2018-05-31 Thread Stack
Just to close the loop, I just made a branch named HDFS-13572 to match the
new non-blocking issue (after some nice encouragement posted up on the
JIRA).
Thanks,
S

On Tue, May 15, 2018 at 9:30 PM, Stack  wrote:

> On Fri, May 4, 2018 at 5:47 AM, Anu Engineer 
> wrote:
>
>> Hi Stack,
>>
>>
>>
>> Why don’t we look at the design of what is being proposed?  Let us post
>> the design to HDFS-9924 and then if needed, by all means let us open a new
>> Jira.
>>
>> That will make it easy to understand the context if someone is looking at
>> HDFS-9924.
>>
>>
>>
>
> I posted a WIP design-for-discussion up on a new issue, HDFS-13572, after
> spending a bunch of time in HDFS-9924 and HADOOP-12910 (Duo had posted an
> earlier version on HDFS-9924 a while back).
>
> HDFS-9924 is stalled. It is filled with "discussion" that seems mostly to
> be behind where we'd like to take-off (i.e. whether hadoop2 or hadoop3
> first, what is an async api, what is async programming, etc.). We hope to
> 'vault' HDFS-9924 by skipping to an hadoop3/jdk8/CompletableFuture basis
> and by taking on contributor requests in HDFS-9924 -- e.g. a design first,
> dev in a feature branch, and so on -- EXCEPTing the hadoop2 targeting.
>
> Hence the new issue for a new undertaking (and to save folks having to
> wade through reams to get to the new effort).
>
>
>
>> I personally believe that it should be the developers of the feature that
>> should decide what goes in, what to call the branch etc. But It would be
>> nice to have
>>
>> some sort of continuity of HDFS-9924.
>>
>>
>>
>
> Agree with the above. I'll take care of tying HDFS-9924 over to the new
> issue.
>
> Thanks,
> St.Ack
>
>
>
>> Thanks
>>
>> Anu
>>
>>
>>
>> *From: * on behalf of Stack 
>> *Date: *Thursday, May 3, 2018 at 9:04 PM
>> *To: *Anu Engineer 
>> *Cc: *Wei-Chiu Chuang , "hdfs-dev@hadoop.apache.org"
>> 
>> *Subject: *Re: [DISCUSSION] Create a branch to work on non-blocking
>> access to HDFS
>>
>>
>>
>> Thanks for support Wei-Chiu and Anu.
>>
>>
>>
>> Thinking more on it, we should just open a new JIRA. HDFS-9924 is an old
>> branch with commits we don't need full of commentary that is, ahem, a mite
>> off-topic.  Duo can attach his design to the new issue. We can cite
>> HDFS-9924 as provenance and aggregate the discussion as launching pad for
>> the new effort in new issue.
>>
>>
>>
>> Hopefully this is agreeable,
>>
>> Thanks,
>>
>>
>>
>> S
>>
>>
>>
>> On Thu, May 3, 2018 at 1:54 PM, Anu Engineer 
>> wrote:
>>
>> Hi St.ack/Wei-Chiu,
>>
>> It is very kind of St.Ack to bring this question to HDFS Dev. I think
>> this is a good feature to have. As for the branch question,
>> HDFS-9924 branch is already open, we could just use that and I am +1 on
>> adding Duo as a branch committer.
>>
>> I am not familiar with HBase code base, I am presuming that there will be
>> some deviation from the current design
>> doc posted in HDFS-9924. Would it be make sense to post a new design
>> proposal on HDFS-9924?
>>
>> --Anu
>>
>>
>>
>>
>> On 5/3/18, 9:29 AM, "Wei-Chiu Chuang"  wrote:
>>
>> Given that HBase 2 uses async output by default, the way that code is
>> maintained today in HBase is not sustainable. That piece of code
>> should be
>> maintained in HDFS. I am +1 as a participant in both communities.
>>
>> On Thu, May 3, 2018 at 9:14 AM, Stack  wrote:
>>
>> > Ok with you lot if a few of us open a branch to work on a
>> non-blocking HDFS
>> > client?
>> >
>> > Intent is to finish up the old issue "HDFS-9924 [umbrella]
>> Nonblocking HDFS
>> > Access". On the foot of this umbrella JIRA is a proposal by the
>> > heavy-lifter, Duo Zhang. Over in HBase, we have a limited async DFS
>> client
>> > (written by Duo) that we use making Write-Ahead Logs. We call it
>> > AsyncFSWAL. It was shipped as the default WAL writer in hbase-2.0.0.
>> >
>> > Let me quote Duo from his proposal at the base of HDFS-9924:
>> >
>> > We use lots of internal APIs of HDFS to implement the
>> AsyncFSWAL, so it
>> > is expected that things like HBASE-20244
>> > <https://issues.apache.org/jira/browse/HBASE-20244>
>> &

[jira] [Resolved] (HDFS-13565) [um

2018-05-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HDFS-13565.
--
Resolution: Invalid

Smile [~ebadger]

Yeah, sorry about that lads. Bad wifi. Resolving as invalid.



> [um
> ---
>
> Key: HDFS-13565
> URL: https://issues.apache.org/jira/browse/HDFS-13565
> Project: Hadoop HDFS
>  Issue Type: New Feature
>    Reporter: stack
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [DISCUSSION] Create a branch to work on non-blocking access to HDFS

2018-05-15 Thread Stack
On Fri, May 4, 2018 at 5:47 AM, Anu Engineer 
wrote:

> Hi Stack,
>
>
>
> Why don’t we look at the design of what is being proposed?  Let us post
> the design to HDFS-9924 and then if needed, by all means let us open a new
> Jira.
>
> That will make it easy to understand the context if someone is looking at
> HDFS-9924.
>
>
>

I posted a WIP design-for-discussion up on a new issue, HDFS-13572, after
spending a bunch of time in HDFS-9924 and HADOOP-12910 (Duo had posted an
earlier version on HDFS-9924 a while back).

HDFS-9924 is stalled. It is filled with "discussion" that seems mostly to
be behind where we'd like to take-off (i.e. whether hadoop2 or hadoop3
first, what is an async api, what is async programming, etc.). We hope to
'vault' HDFS-9924 by skipping to an hadoop3/jdk8/CompletableFuture basis
and by taking on contributor requests in HDFS-9924 -- e.g. a design first,
dev in a feature branch, and so on -- EXCEPTing the hadoop2 targeting.

Hence the new issue for a new undertaking (and to save folks having to wade
through reams to get to the new effort).



> I personally believe that it should be the developers of the feature that
> should decide what goes in, what to call the branch etc. But It would be
> nice to have
>
> some sort of continuity of HDFS-9924.
>
>
>

Agree with the above. I'll take care of tying HDFS-9924 over to the new
issue.

Thanks,
St.Ack



> Thanks
>
> Anu
>
>
>
> *From: * on behalf of Stack 
> *Date: *Thursday, May 3, 2018 at 9:04 PM
> *To: *Anu Engineer 
> *Cc: *Wei-Chiu Chuang , "hdfs-dev@hadoop.apache.org" <
> hdfs-dev@hadoop.apache.org>
> *Subject: *Re: [DISCUSSION] Create a branch to work on non-blocking
> access to HDFS
>
>
>
> Thanks for support Wei-Chiu and Anu.
>
>
>
> Thinking more on it, we should just open a new JIRA. HDFS-9924 is an old
> branch with commits we don't need full of commentary that is, ahem, a mite
> off-topic.  Duo can attach his design to the new issue. We can cite
> HDFS-9924 as provenance and aggregate the discussion as launching pad for
> the new effort in new issue.
>
>
>
> Hopefully this is agreeable,
>
> Thanks,
>
>
>
> S
>
>
>
> On Thu, May 3, 2018 at 1:54 PM, Anu Engineer 
> wrote:
>
> Hi St.ack/Wei-Chiu,
>
> It is very kind of St.Ack to bring this question to HDFS Dev. I think this
> is a good feature to have. As for the branch question,
> HDFS-9924 branch is already open, we could just use that and I am +1 on
> adding Duo as a branch committer.
>
> I am not familiar with HBase code base, I am presuming that there will be
> some deviation from the current design
> doc posted in HDFS-9924. Would it be make sense to post a new design
> proposal on HDFS-9924?
>
> --Anu
>
>
>
>
> On 5/3/18, 9:29 AM, "Wei-Chiu Chuang"  wrote:
>
> Given that HBase 2 uses async output by default, the way that code is
> maintained today in HBase is not sustainable. That piece of code
> should be
> maintained in HDFS. I am +1 as a participant in both communities.
>
> On Thu, May 3, 2018 at 9:14 AM, Stack  wrote:
>
> > Ok with you lot if a few of us open a branch to work on a
> non-blocking HDFS
> > client?
> >
> > Intent is to finish up the old issue "HDFS-9924 [umbrella]
> Nonblocking HDFS
> > Access". On the foot of this umbrella JIRA is a proposal by the
> > heavy-lifter, Duo Zhang. Over in HBase, we have a limited async DFS
> client
> > (written by Duo) that we use making Write-Ahead Logs. We call it
> > AsyncFSWAL. It was shipped as the default WAL writer in hbase-2.0.0.
> >
> > Let me quote Duo from his proposal at the base of HDFS-9924:
> >
> > We use lots of internal APIs of HDFS to implement the
> AsyncFSWAL, so it
> > is expected that things like HBASE-20244
> > <https://issues.apache.org/jira/browse/HBASE-20244>
> > ["NoSuchMethodException
> > when retrieving private method decryptEncryptedDataEncryptionKey
> from
> > DFSClient"] will happen again and again.
> >
> > To make life easier, we need to move the async output related code
> into
> > HDFS. The POC [attached as patch on HDFS-9924] shows that option 3
> [1] can
> > work, so I would like to create a feature branch to implement the
> async dfs
> > client. In general I think there are 4 steps:
> >
> > 1. Implement an async rpc client with option 3 [1] described above.
> > 2. Implement the filesystem APIs which only need to connect to NN,
> such as
>

[jira] [Created] (HDFS-13572) [umbrella] Non-blocking HDFS Access for H3

2018-05-15 Thread stack (JIRA)
stack created HDFS-13572:


 Summary: [umbrella] Non-blocking HDFS Access for H3
 Key: HDFS-13572
 URL: https://issues.apache.org/jira/browse/HDFS-13572
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: fs async
Affects Versions: 3.0.0
Reporter: stack


An umbrella JIRA for supporting non-blocking HDFS access in h3.

This issue has provenance in the stalled HDFS-9924 but would like to vault over 
what was going on over there, in particular, focus on an async API for hadoop3+ 
unencumbered by worries about how to make it work in hadoop2.

Let me post a WIP design. Would love input/feedback (We make mention of the 
HADOOP-12910 call for spec but as future work -- hopefully thats ok). Was 
thinking of cutting a feature branch if all good after a bit of chat.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13565) [um

2018-05-15 Thread stack (JIRA)
stack created HDFS-13565:


 Summary: [um
 Key: HDFS-13565
 URL: https://issues.apache.org/jira/browse/HDFS-13565
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: stack






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [DISCUSSION] Create a branch to work on non-blocking access to HDFS

2018-05-03 Thread Stack
Thanks for support Wei-Chiu and Anu.

Thinking more on it, we should just open a new JIRA. HDFS-9924 is an old
branch with commits we don't need full of commentary that is, ahem, a mite
off-topic.  Duo can attach his design to the new issue. We can cite
HDFS-9924 as provenance and aggregate the discussion as launching pad for
the new effort in new issue.

Hopefully this is agreeable,
Thanks,

S

On Thu, May 3, 2018 at 1:54 PM, Anu Engineer 
wrote:

> Hi St.ack/Wei-Chiu,
>
> It is very kind of St.Ack to bring this question to HDFS Dev. I think this
> is a good feature to have. As for the branch question,
> HDFS-9924 branch is already open, we could just use that and I am +1 on
> adding Duo as a branch committer.
>
> I am not familiar with HBase code base, I am presuming that there will be
> some deviation from the current design
> doc posted in HDFS-9924. Would it be make sense to post a new design
> proposal on HDFS-9924?
>
> --Anu
>
>
>
> On 5/3/18, 9:29 AM, "Wei-Chiu Chuang"  wrote:
>
> Given that HBase 2 uses async output by default, the way that code is
> maintained today in HBase is not sustainable. That piece of code
> should be
> maintained in HDFS. I am +1 as a participant in both communities.
>
> On Thu, May 3, 2018 at 9:14 AM, Stack  wrote:
>
> > Ok with you lot if a few of us open a branch to work on a
> non-blocking HDFS
> > client?
> >
> > Intent is to finish up the old issue "HDFS-9924 [umbrella]
> Nonblocking HDFS
> > Access". On the foot of this umbrella JIRA is a proposal by the
> > heavy-lifter, Duo Zhang. Over in HBase, we have a limited async DFS
> client
> > (written by Duo) that we use making Write-Ahead Logs. We call it
> > AsyncFSWAL. It was shipped as the default WAL writer in hbase-2.0.0.
> >
> > Let me quote Duo from his proposal at the base of HDFS-9924:
> >
> > We use lots of internal APIs of HDFS to implement the
> AsyncFSWAL, so it
> > is expected that things like HBASE-20244
> > <https://issues.apache.org/jira/browse/HBASE-20244>
> > ["NoSuchMethodException
> > when retrieving private method decryptEncryptedDataEncryptionKey
> from
> > DFSClient"] will happen again and again.
> >
> > To make life easier, we need to move the async output related code
> into
> > HDFS. The POC [attached as patch on HDFS-9924] shows that option 3
> [1] can
> > work, so I would like to create a feature branch to implement the
> async dfs
> > client. In general I think there are 4 steps:
> >
> > 1. Implement an async rpc client with option 3 [1] described above.
> > 2. Implement the filesystem APIs which only need to connect to NN,
> such as
> > 'mkdirs'.
> > 3. Implement async file read. The problem is the API. For pread I
> think a
> > CompletableFuture is enough, the problem is for the streaming read.
> Need to
> > discuss later.
> > 4. Implement async file write. The API will also be a problem, but a
> more
> > important problem is that, if we want to support fan-out, the
> current logic
> > at DN side will make the semantic broken as we can read uncommitted
> data
> > very easily. In HBase it is solved by HBASE-14004
> > <https://issues.apache.org/jira/browse/HBASE-14004> but I do not
> think we
> > should keep the broken behavior in HDFS. We need to find a way to
> deal with
> > it.
> >
> > Comments welcome.
> >
> > Intent is to make a branch named HDFS-9924 (or should we just do a
> new
> > JIRA?) and to add Duo as a feature branch committer. If all goes
> well,
> > we'll call for a merge VOTE.
> >
> > Thanks,
> > St.Ack
> >
> > 1.Option 3:  "Use the old protobuf rpc interface and implement a new
> rpc
> > framework. The benefit is that we also do not need port unification
> service
> > at server side and do not need to maintain two implementations at
> server
> > side. And one more thing is that we do not need to upgrade protobuf
> to
> > 3.x."
> >
>
>
>
> --
> A very happy Hadoop contributor
>
>
>


[DISCUSSION] Create a branch to work on non-blocking access to HDFS

2018-05-03 Thread Stack
Ok with you lot if a few of us open a branch to work on a non-blocking HDFS
client?

Intent is to finish up the old issue "HDFS-9924 [umbrella] Nonblocking HDFS
Access". On the foot of this umbrella JIRA is a proposal by the
heavy-lifter, Duo Zhang. Over in HBase, we have a limited async DFS client
(written by Duo) that we use making Write-Ahead Logs. We call it
AsyncFSWAL. It was shipped as the default WAL writer in hbase-2.0.0.

Let me quote Duo from his proposal at the base of HDFS-9924:

We use lots of internal APIs of HDFS to implement the AsyncFSWAL, so it
is expected that things like HBASE-20244
 ["NoSuchMethodException
when retrieving private method decryptEncryptedDataEncryptionKey from
DFSClient"] will happen again and again.

To make life easier, we need to move the async output related code into
HDFS. The POC [attached as patch on HDFS-9924] shows that option 3 [1] can
work, so I would like to create a feature branch to implement the async dfs
client. In general I think there are 4 steps:

1. Implement an async rpc client with option 3 [1] described above.
2. Implement the filesystem APIs which only need to connect to NN, such as
'mkdirs'.
3. Implement async file read. The problem is the API. For pread I think a
CompletableFuture is enough, the problem is for the streaming read. Need to
discuss later.
4. Implement async file write. The API will also be a problem, but a more
important problem is that, if we want to support fan-out, the current logic
at DN side will make the semantic broken as we can read uncommitted data
very easily. In HBase it is solved by HBASE-14004
 but I do not think we
should keep the broken behavior in HDFS. We need to find a way to deal with
it.

Comments welcome.

Intent is to make a branch named HDFS-9924 (or should we just do a new
JIRA?) and to add Duo as a feature branch committer. If all goes well,
we'll call for a merge VOTE.

Thanks,
St.Ack

1.Option 3:  "Use the old protobuf rpc interface and implement a new rpc
framework. The benefit is that we also do not need port unification service
at server side and do not need to maintain two implementations at server
side. And one more thing is that we do not need to upgrade protobuf to 3.x."


Re: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk

2017-11-03 Thread Stack
On Sat, Oct 28, 2017 at 2:00 PM, Konstantin Shvachko 
wrote:

> Hey guys,
>
> It is an interesting question whether Ozone should be a part of Hadoop.
>


I don't see a direct answer to this question. Is there one? Pardon me if
I've not seen it but I'm interested in the response.

I ask because IMO the "Hadoop" project is over-stuffed already. Just see
the length of the cc list on this email. Ozone could be standalone. It is a
coherent enough effort.

Thanks,
St.Ack





> There are two main reasons why I think it should not.
>
1. With close to 500 sub-tasks, with 6 MB of code changes, and with a
> sizable community behind, it looks to me like a whole new project.
> It is essentially a new storage system, with different (than HDFS)
> architecture, separate S3-like APIs. This is really great - the World sure
> needs more distributed file systems. But it is not clear why Ozone should
> co-exist with HDFS under the same roof.
>
> 2. Ozone is probably just the first step in rebuilding HDFS under a new
> architecture. With the next steps presumably being HDFS-10419 and
> HDFS-8.
> The design doc for the new architecture has never been published. I can
> only assume based on some presentations and personal communications that
> the idea is to use Ozone as a block storage, and re-implement NameNode, so
> that it stores only a partial namesapce in memory, while the bulk of it
> (cold data) is persisted to a local storage.
> Such architecture makes me wonder if it solves Hadoop's main problems.
> There are two main limitations in HDFS:
>   a. The throughput of Namespace operations. Which is limited by the number
> of RPCs the NameNode can handle
>   b. The number of objects (files + blocks) the system can maintain. Which
> is limited by the memory size of the NameNode.
> The RPC performance (a) is more important for Hadoop scalability than the
> object count (b). The read RPCs being the main priority.
> The new architecture targets the object count problem, but in the expense
> of the RPC throughput. Which seems to be a wrong resolution of the
> tradeoff.
> Also based on the use patterns on our large clusters we read up to 90% of
> the data we write, so cold data is a small fraction and most of it must be
> cached.
>
> To summarize:
> - Ozone is a big enough system to deserve its own project.
> - The architecture that Ozone leads to does not seem to solve the intrinsic
> problems of current HDFS.
>
> I will post my opinion in the Ozone jira. Should be more convenient to
> discuss it there for further reference.
>
> Thanks,
> --Konstantin
>
>
>
> On Wed, Oct 18, 2017 at 6:54 PM, Yang Weiwei 
> wrote:
>
> > Hello everyone,
> >
> >
> > I would like to start this thread to discuss merging Ozone (HDFS-7240) to
> > trunk. This feature implements an object store which can co-exist with
> > HDFS. Ozone is disabled by default. We have tested Ozone with cluster
> sizes
> > varying from 1 to 100 data nodes.
> >
> >
> >
> > The merge payload includes the following:
> >
> >   1.  All services, management scripts
> >   2.  Object store APIs, exposed via both REST and RPC
> >   3.  Master service UIs, command line interfaces
> >   4.  Pluggable pipeline Integration
> >   5.  Ozone File System (Hadoop compatible file system implementation,
> > passes all FileSystem contract tests)
> >   6.  Corona - a load generator for Ozone.
> >   7.  Essential documentation added to Hadoop site.
> >   8.  Version specific Ozone Documentation, accessible via service UI.
> >   9.  Docker support for ozone, which enables faster development cycles.
> >
> >
> > To build Ozone and run ozone using docker, please follow instructions in
> > this wiki page. https://cwiki.apache.org/confl
> > uence/display/HADOOP/Dev+cluster+with+docker.
> >
> >
> > We have built a passionate and diverse community to drive this feature
> > development. As a team, we have achieved significant progress in past 3
> > years since first JIRA for HDFS-7240 was opened on Oct 2014. So far, we
> > have resolved almost 400 JIRAs by 20+ contributors/committers from
> > different countries and affiliations. We also want to thank the large
> > number of community members who were supportive of our efforts and
> > contributed ideas and participated in the design of ozone.
> >
> >
> > Please share your thoughts, thanks!
> >
> >
> > -- Weiwei Yang
> >
>
>
> On Wed, Oct 18, 2017 at 6:54 PM, Yang Weiwei 
> wrote:
>
> > Hello everyone,
> >
> >
> > I would like to start this thread to discuss merging Ozone (HDFS-7240) to
> > trunk. This feature implements an object store which can co-exist with
> > HDFS. Ozone is disabled by default. We have tested Ozone with cluster
> sizes
> > varying from 1 to 100 data nodes.
> >
> >
> >
> > The merge payload includes the following:
> >
> >   1.  All services, management scripts
> >   2.  Object store APIs, exposed via both REST and RPC
> >   3.  Master service UIs, command line interfaces
> >   4.  Pluggable pipeline Integration
> >   5.  Ozone File S

Re: Can we update protobuf's version on trunk?

2017-03-30 Thread Stack
On Thu, Mar 30, 2017 at 9:16 AM, Chris Douglas 
wrote:

> On Wed, Mar 29, 2017 at 4:59 PM, Stack  wrote:
> >> The former; an intermediate handler decoding, [modifying,] and
> >> encoding the record without losing unknown fields.
> >>
> >
> > I did not try this. Did you? Otherwise I can.
>
> Yeah, I did. Same format. -C
>
>
Grand.
St.Ack




> >> This looks fine. -C
> >>
> >> > Thanks,
> >> > St.Ack
> >> >
> >> >
> >> > # Using the protoc v3.0.2 tool
> >> > $ protoc --version
> >> > libprotoc 3.0.2
> >> >
> >> > # I have a simple proto definition with two fields in it
> >> > $ more pb.proto
> >> > message Test {
> >> >   optional string one = 1;
> >> >   optional string two = 2;
> >> > }
> >> >
> >> > # This is a text-encoded instance of a 'Test' proto message:
> >> > $ more pb.txt
> >> > one: "one"
> >> > two: "two"
> >> >
> >> > # Now I encode the above as a pb binary
> >> > $ protoc --encode=Test pb.proto < pb.txt > pb.bin
> >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No
> syntax
> >> > specified for the proto file: pb.proto. Please use 'syntax =
> "proto2";'
> >> > or
> >> > 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2
> >> > syntax.)
> >> >
> >> > # Here is a dump of the binary
> >> > $ od -xc pb.bin
> >> > 000  030a6e6f126574036f77
> >> >   \n 003   o   n   e 022 003   t   w   o
> >> > 012
> >> >
> >> > # Here is a proto definition file that has a Test Message minus the
> >> > 'two'
> >> > field.
> >> > $ more pb_drops_two.proto
> >> > message Test {
> >> >   optional string one = 1;
> >> > }
> >> >
> >> > # Use it to decode the bin file:
> >> > $ protoc --decode=Test pb_drops_two.proto < pb.bin
> >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No
> syntax
> >> > specified for the proto file: pb_drops_two.proto. Please use 'syntax =
> >> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
> >> > (Defaulted
> >> > to proto2 syntax.)
> >> > one: "one"
> >> > 2: "two"
> >> >
> >> > Note how the second field is preserved (absent a field name). It is
> not
> >> > dropped.
> >> >
> >> > If I change the syntax of pb_drops_two.proto to be proto3, the field
> IS
> >> > dropped.
> >> >
> >> > # Here proto file with proto3 syntax specified (had to drop the
> >> > 'optional'
> >> > qualifier -- not allowed in proto3):
> >> > $ more pb_drops_two.proto
> >> > syntax = "proto3";
> >> > message Test {
> >> >   string one = 1;
> >> > }
> >> >
> >> > $ protoc --decode=Test pb_drops_two.proto < pb.bin  > pb_drops_two.txt
> >> > $ more pb_drops_two.txt
> >> > one: "one"
> >> >
> >> >
> >> > I cannot reencode the text output using pb_drops_two.proto. It
> >> > complains:
> >> >
> >> > $ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt >
> >> > pb_drops_two.bin
> >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No
> syntax
> >> > specified for the proto file: pb_drops_two.proto. Please use 'syntax =
> >> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
> >> > (Defaulted
> >> > to proto2 syntax.)
> >> > input:2:1: Expected identifier, got: 2
> >> >
> >> > Proto 2.5 does same:
> >> >
> >> > $ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto <
> >> > pb_drops_two.txt > pb_drops_two.bin
> >> > input:2:1: Expected identifier.
> >> > Failed to parse input.
> >> >
> >> > St.Ack
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Wed, Mar 29, 2017 at 10:14 AM, Stack  wr

Re: Can we update protobuf's version on trunk?

2017-03-29 Thread Stack
On Wed, Mar 29, 2017 at 3:12 PM, Chris Douglas 
wrote:

> On Wed, Mar 29, 2017 at 1:13 PM, Stack  wrote:
> > Is the below evidence enough that pb3 in proto2 syntax mode does not drop
> > 'unknown' fields? (Maybe you want evidence that java tooling behaves the
> > same?)
>
> I reproduced your example with the Java tooling, including changing
> some of the fields in the intermediate representation. As long as the
> syntax is "proto2", it seems to have compatible semantics.
>
>
Thanks.


> > To be clear, when we say proxy above, are we expecting that a pb message
> > deserialized by a process down-the-line that happens to have a crimped
> proto
> > definition that is absent a couple of fields somehow can re-serialize
> and at
> > the end of the line, all fields are present? Or are we talking
> pass-through
> > of the message without rewrite?
>
> The former; an intermediate handler decoding, [modifying,] and
> encoding the record without losing unknown fields.
>
>
I did not try this. Did you? Otherwise I can.

St.Ack


> This looks fine. -C
>
> > Thanks,
> > St.Ack
> >
> >
> > # Using the protoc v3.0.2 tool
> > $ protoc --version
> > libprotoc 3.0.2
> >
> > # I have a simple proto definition with two fields in it
> > $ more pb.proto
> > message Test {
> >   optional string one = 1;
> >   optional string two = 2;
> > }
> >
> > # This is a text-encoded instance of a 'Test' proto message:
> > $ more pb.txt
> > one: "one"
> > two: "two"
> >
> > # Now I encode the above as a pb binary
> > $ protoc --encode=Test pb.proto < pb.txt > pb.bin
> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
> > specified for the proto file: pb.proto. Please use 'syntax = "proto2";'
> or
> > 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2
> > syntax.)
> >
> > # Here is a dump of the binary
> > $ od -xc pb.bin
> > 000  030a6e6f126574036f77
> >   \n 003   o   n   e 022 003   t   w   o
> > 012
> >
> > # Here is a proto definition file that has a Test Message minus the 'two'
> > field.
> > $ more pb_drops_two.proto
> > message Test {
> >   optional string one = 1;
> > }
> >
> > # Use it to decode the bin file:
> > $ protoc --decode=Test pb_drops_two.proto < pb.bin
> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
> > specified for the proto file: pb_drops_two.proto. Please use 'syntax =
> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
> (Defaulted
> > to proto2 syntax.)
> > one: "one"
> > 2: "two"
> >
> > Note how the second field is preserved (absent a field name). It is not
> > dropped.
> >
> > If I change the syntax of pb_drops_two.proto to be proto3, the field IS
> > dropped.
> >
> > # Here proto file with proto3 syntax specified (had to drop the
> 'optional'
> > qualifier -- not allowed in proto3):
> > $ more pb_drops_two.proto
> > syntax = "proto3";
> > message Test {
> >   string one = 1;
> > }
> >
> > $ protoc --decode=Test pb_drops_two.proto < pb.bin  > pb_drops_two.txt
> > $ more pb_drops_two.txt
> > one: "one"
> >
> >
> > I cannot reencode the text output using pb_drops_two.proto. It complains:
> >
> > $ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt >
> > pb_drops_two.bin
> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
> > specified for the proto file: pb_drops_two.proto. Please use 'syntax =
> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
> (Defaulted
> > to proto2 syntax.)
> > input:2:1: Expected identifier, got: 2
> >
> > Proto 2.5 does same:
> >
> > $ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto <
> > pb_drops_two.txt > pb_drops_two.bin
> > input:2:1: Expected identifier.
> > Failed to parse input.
> >
> > St.Ack
> >
> >
> >
> >
> >
> >
> > On Wed, Mar 29, 2017 at 10:14 AM, Stack  wrote:
> >>
> >> On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang 
> >> wrote:
> >>>
> >>> >
> >>> > > If unknown fields are dropped, then applications proxying tok

Re: Can we update protobuf's version on trunk?

2017-03-29 Thread Stack
Is the below evidence enough that pb3 in proto2 syntax mode does not drop
'unknown' fields? (Maybe you want evidence that java tooling behaves the
same?)

To be clear, when we say proxy above, are we expecting that a pb message
deserialized by a process down-the-line that happens to have a crimped
proto definition that is absent a couple of fields somehow can re-serialize
and at the end of the line, all fields are present? Or are we talking
pass-through of the message without rewrite?

Thanks,
St.Ack


# Using the protoc v3.0.2 tool
$ protoc --version
libprotoc 3.0.2

# I have a simple proto definition with two fields in it
$ more pb.proto
message Test {
  optional string one = 1;
  optional string two = 2;
}

# This is a text-encoded instance of a 'Test' proto message:
$ more pb.txt
one: "one"
two: "two"

# Now I encode the above as a pb binary
$ protoc --encode=Test pb.proto < pb.txt > pb.bin
[libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
specified for the proto file: pb.proto. Please use 'syntax = "proto2";' or
'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2
syntax.)

# Here is a dump of the binary
$ od -xc pb.bin
000  030a6e6f126574036f77
  \n 003   o   n   e 022 003   t   w   o
012

# Here is a proto definition file that has a Test Message minus the 'two'
field.
$ more pb_drops_two.proto
message Test {
  optional string one = 1;
}

# Use it to decode the bin file:
$ protoc --decode=Test pb_drops_two.proto < pb.bin
[libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
specified for the proto file: pb_drops_two.proto. Please use 'syntax =
"proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted
to proto2 syntax.)
one: "one"
2: "two"

Note how the second field is preserved (absent a field name). It is not
dropped.

If I change the syntax of pb_drops_two.proto to be proto3, the field IS
dropped.

# Here proto file with proto3 syntax specified (had to drop the 'optional'
qualifier -- not allowed in proto3):
$ more pb_drops_two.proto
syntax = "proto3";
message Test {
  string one = 1;
}

$ protoc --decode=Test pb_drops_two.proto < pb.bin  > pb_drops_two.txt
$ more pb_drops_two.txt
one: "one"


I cannot reencode the text output using pb_drops_two.proto. It complains:

$ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt >
pb_drops_two.bin
[libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
specified for the proto file: pb_drops_two.proto. Please use 'syntax =
"proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted
to proto2 syntax.)
input:2:1: Expected identifier, got: 2

Proto 2.5 does same:

$ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto <
pb_drops_two.txt > pb_drops_two.bin
input:2:1: Expected identifier.
Failed to parse input.

St.Ack






On Wed, Mar 29, 2017 at 10:14 AM, Stack  wrote:

> On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang 
> wrote:
>
>> >
>> > > If unknown fields are dropped, then applications proxying tokens and
>> > other
>> > >> data between servers will effectively corrupt those messages, unless
>> we
>> > >> make everything opaque bytes, which- absent the convenient,
>> prenominate
>> > >> semantics managing the conversion- obviate the compatibility
>> machinery
>> > that
>> > >> is the whole point of PB. Google is removing the features that
>> justified
>> > >> choosing PB over its alternatives. Since we can't require that our
>> > >> applications compile (or link) against our updated schema, this
>> creates
>> > a
>> > >> problem that PB was supposed to solve.
>> > >
>> > >
>> > > This is scary, and it potentially affects services outside of the
>> Hadoop
>> > > codebase. This makes it difficult to assess the impact.
>> >
>> > Stack mentioned a compatibility mode that uses the proto2 semantics.
>> > If that carries unknown fields through intermediate handlers, then
>> > this objection goes away. -C
>>
>>
>> Did some more googling, found this:
>>
>> https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ
>>
>> Feng Xiao appears to be a Google engineer, and suggests workarounds like
>> packing the fields into a byte type. No mention of a PB2 compatibility
>> mode. Also here:
>>
>> https://groups.google.com/d/msg/protobuf/bO2L6-_t91Q/-zIaJAR9AAAJ
>>
>> Participants say that unknown fields were dropped for automatic JSON
>> encoding, since you can't losslessly convert to JSON without knowing the
>> type.
>>
>> Unfortunately, it sounds like these are intrinsic differences with PB3.
>>
>>
> As I read it Andrew, the field-dropping happens when pb3 is running in
> proto3 'mode'. Let me try it...
>
> St.Ack
>
>
>
>> Best,
>> Andrew
>>
>
>


Re: Can we update protobuf's version on trunk?

2017-03-29 Thread Stack
On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang 
wrote:

> >
> > > If unknown fields are dropped, then applications proxying tokens and
> > other
> > >> data between servers will effectively corrupt those messages, unless
> we
> > >> make everything opaque bytes, which- absent the convenient,
> prenominate
> > >> semantics managing the conversion- obviate the compatibility machinery
> > that
> > >> is the whole point of PB. Google is removing the features that
> justified
> > >> choosing PB over its alternatives. Since we can't require that our
> > >> applications compile (or link) against our updated schema, this
> creates
> > a
> > >> problem that PB was supposed to solve.
> > >
> > >
> > > This is scary, and it potentially affects services outside of the
> Hadoop
> > > codebase. This makes it difficult to assess the impact.
> >
> > Stack mentioned a compatibility mode that uses the proto2 semantics.
> > If that carries unknown fields through intermediate handlers, then
> > this objection goes away. -C
>
>
> Did some more googling, found this:
>
> https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ
>
> Feng Xiao appears to be a Google engineer, and suggests workarounds like
> packing the fields into a byte type. No mention of a PB2 compatibility
> mode. Also here:
>
> https://groups.google.com/d/msg/protobuf/bO2L6-_t91Q/-zIaJAR9AAAJ
>
> Participants say that unknown fields were dropped for automatic JSON
> encoding, since you can't losslessly convert to JSON without knowing the
> type.
>
> Unfortunately, it sounds like these are intrinsic differences with PB3.
>
>
As I read it Andrew, the field-dropping happens when pb3 is running in
proto3 'mode'. Let me try it...

St.Ack



> Best,
> Andrew
>


[jira] [Created] (HDFS-11368) LocalFS does not allow setting storage policy so spew running in local mode

2017-01-25 Thread stack (JIRA)
stack created HDFS-11368:


 Summary: LocalFS does not allow setting storage policy so spew 
running in local mode
 Key: HDFS-11368
 URL: https://issues.apache.org/jira/browse/HDFS-11368
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Minor


commit f92a14ade635e4b081f3938620979b5864ac261f
Author: Yu Li 
Date:   Mon Jan 9 09:52:58 2017 +0800

HBASE-14061 Support CF-level Storage Policy

...added setting storage policy which is nice. Being able to set storage policy 
came in in hdfs 2.6.0 (HDFS-6584 Support Archival Storage) but you can only do 
this for DFS, not for local FS.

Upshot is that starting up hbase in standalone mode, which uses localfs, you 
get this exception every time:

{code}
2017-01-25 12:26:53,400 WARN  [StoreOpener-93375c645ef2e649620b5d8ed9375985-1] 
fs.HFileSystem: Failed to set storage policy of 
[file:/var/folders/d8/8lyxycpd129d4fj7lb684dwhgp/T/hbase-stack/hbase/data/hbase/namespace/93375c645ef2e649620b5d8ed9375985/info]
 to [HOT]
java.lang.UnsupportedOperationException: Cannot find specified method 
setStoragePolicy
at 
org.apache.hadoop.hbase.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:209)
at 
org.apache.hadoop.hbase.fs.HFileSystem.setStoragePolicy(HFileSystem.java:161)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hbase.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:207)
at 
org.apache.hadoop.hbase.regionserver.HRegionFileSystem.setStoragePolicy(HRegionFileSystem.java:198)
at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:237)
at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:5265)
at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:988)
at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:985)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoSuchMethodException: 
org.apache.hadoop.fs.LocalFileSystem.setStoragePolicy(org.apache.hadoop.fs.Path,
 java.lang.String)
at java.lang.Class.getMethod(Class.java:1786)
at 
org.apache.hadoop.hbase.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:205)
...
{code}

It is distracting at the least. Let me fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-9187) Check if tracer is null before using it

2015-10-01 Thread stack (JIRA)
stack created HDFS-9187:
---

 Summary: Check if tracer is null before using it
 Key: HDFS-9187
 URL: https://issues.apache.org/jira/browse/HDFS-9187
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tracing
Affects Versions: 2.8.0
Reporter: stack


Saw this where an hbase that has not been updated to htrace-4.0.1 was trying to 
start:

{code}
Oct 1, 5:12:11.861 AM FATAL org.apache.hadoop.hbase.master.HMaster
Failed to become active master
java.lang.NullPointerException
at org.apache.hadoop.fs.Globber.glob(Globber.java:145)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1634)
at org.apache.hadoop.hbase.util.FSUtils.getTableDirs(FSUtils.java:1372)
at 
org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:206)
at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:619)
at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:169)
at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1481)
at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Looking to a Hadoop 3 release

2015-03-04 Thread Stack
In general +1 on 3.0.0. Its time. If we start now, it might make it out by
2016. If we start now, downstreamers can start aligning themselves to land
versions that suit at about the same time.

While two big items have been called out as possible incompatible changes,
and there is ongoing discussion as to whether they are or not*, is there
any chance of getting a longer list of big differences between the
branches? In particular I'd be interested in improvements that are 'off' by
default that would be better defaulted 'on'.

Thanks,
St.Ack

* Let me note that 'compatible' around these parts is a trampled concept
seemingly open to interpretation with a definition that is other than
prevails elsewhere in software. See Allen's list above, and in our
downstream project, the recent HBASE-13149 "HBase server MR tools are
broken on Hadoop 2.5+ Yarn", among others.  Let 3.x be incompatible with
2.x if only so we can leave behind all current notions of 'compatibility'
and just start over (as per Allen).


On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang 
wrote:

> Hi devs,
>
> It's been a year and a half since 2.x went GA, and I think we're about due
> for a 3.x release.
> Notably, there are two incompatible changes I'd like to call out, that will
> have a tremendous positive impact for our users.
>
> First, classpath isolation being done at HADOOP-11656, which has been a
> long-standing request from many downstreams and Hadoop users.
>
> Second, bumping the source and target JDK version to JDK8 (related to
> HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
> months from now). In the past, we've had issues with our dependencies
> discontinuing support for old JDKs, so this will future-proof us.
>
> Between the two, we'll also have quite an opportunity to clean up and
> upgrade our dependencies, another common user and developer request.
>
> I'd like to propose that we start rolling a series of monthly-ish series of
> 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
> other cat herding responsibilities. There are already quite a few changes
> slated for 3.0 besides the above (for instance the shell script rewrite) so
> there's already value in a 3.0 alpha, and the more time we give downstreams
> to integrate, the better.
>
> This opens up discussion about inclusion of other changes, but I'm hoping
> to freeze incompatible changes after maybe two alphas, do a beta (with no
> further incompat changes allowed), and then finally a 3.x GA. For those
> keeping track, that means a 3.x GA in about four months.
>
> I would also like to stress though that this is not intended to be a big
> bang release. For instance, it would be great if we could maintain wire
> compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
> branch-2 and branch-3 similar also makes backports easier, since we're
> likely maintaining 2.x for a while yet.
>
> Please let me know any comments / concerns related to the above. If people
> are friendly to the idea, I'd like to cut a branch-3 and start working on
> the first alpha.
>
> Best,
> Andrew
>


Re: [DISCUSS] Allow continue reading from being-written file using same stream

2014-09-19 Thread Stack
On Thu, Sep 18, 2014 at 12:48 AM, Vinayakumar B 
wrote:

> Hi all,
>
> Currently *DFSInputStream *doen't allow reading a write-inprogress file,
> once all written bytes, by the time of opening an input stream, are read.
>
> To read further update on the same file, needs to be read by opening
> another stream to the same file again.
>
> Instead how about refreshing length of such open files if the current
> position is at earlier EOF.
>

Are you talking tailing an HDFS file without having to fake it with a loop
that does open, read till EOF, close, repeat?  If so, sounds great.
St.Ack


[jira] [Created] (HDFS-6803) Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context

2014-07-31 Thread stack (JIRA)
stack created HDFS-6803:
---

 Summary: Documenting DFSClient#DFSInputStream expectations reading 
and preading in concurrent context
 Key: HDFS-6803
 URL: https://issues.apache.org/jira/browse/HDFS-6803
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 2.4.1
Reporter: stack
 Attachments: DocumentingDFSClientDFSInputStream (1).pdf

Reviews of the patch posted the parent task suggest that we be more explicit 
about how DFSIS is expected to behave when being read by contending threads. It 
is also suggested that presumptions made internally be made explicit 
documenting expectations.

Before we put up a patch we've made a document of assertions we'd like to make 
into tenets of DFSInputSteam.  If agreement, we'll attach to this issue a patch 
that weaves the assumptions into DFSIS as javadoc and class comments. 





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6047) TestPread NPE inside in DFSInputStream hedgedFetchBlockByteRange

2014-03-03 Thread stack (JIRA)
stack created HDFS-6047:
---

 Summary: TestPread NPE inside in DFSInputStream 
hedgedFetchBlockByteRange
 Key: HDFS-6047
 URL: https://issues.apache.org/jira/browse/HDFS-6047
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: stack
Assignee: stack
 Fix For: 2.4.0


Our [~andrew.wang] saw this on internal test cluster running trunk:

{code}
java.lang.NullPointerException: null
at 
org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1181)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1296)
at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:78)
at 
org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:108)
at org.apache.hadoop.hdfs.TestPread.pReadFile(TestPread.java:151)
at 
org.apache.hadoop.hdfs.TestPread.testMaxOutHedgedReadPool(TestPread.java:292)
{code}

TestPread was failing.

The NPE comes of our presuming there always a chosenNode as we set up hedged 
reads inside in hedgedFetchBlockByteRange (chosenNode is null'd each time 
through the loop).  Usually there is a chosenNode but need to allow for case 
where there is not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [VOTE] Release Apache Hadoop 2.3.0

2014-02-12 Thread Stack
+1

Downloaded, deployed to small cluster, and then ran an hbase loading on top
of it.  Looks good.

Packaging wise, is it intentional that some jars show up a few times?  I
can understand webapps bundling a copy but doesn't mapreduce depend on
commons?

share/hadoop/mapreduce/lib/hadoop-annotations-2.3.0.jar
share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/hadoop-annotations-2.3.0.jar
share/hadoop/common/lib/hadoop-annotations-2.3.0.jar

If not intentional, I can make up a better report and file an issue.

Thanks,
St.Ack




On Tue, Feb 11, 2014 at 6:49 AM, Arun C Murthy  wrote:

> Folks,
>
> I've created a release candidate (rc0) for hadoop-2.3.0 that I would like
> to get released.
>
> The RC is available at:
> http://people.apache.org/~acmurthy/hadoop-2.3.0-rc0
> The RC tag in svn is here:
> https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.3.0-rc0
>
> The maven artifacts are available via repository.apache.org.
>
> Please try the release and vote; the vote will run for the usual 7 days.
>
> thanks,
> Arun
>
> PS: Thanks to Andrew, Vinod & Alejandro for all their help in various
> release activities.
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


Re: Is it me or is the new bootstrap HDFS UI HWX green? (See tip of hadoop-2.3 branch)

2014-02-04 Thread Stack
On Mon, Feb 3, 2014 at 9:26 PM, Chris Douglas  wrote:
>
> ...
> Please take this offline. -C
>
>
No problem.
St.Ack


Re: Is it me or is the new bootstrap HDFS UI HWX green? (See tip of hadoop-2.3 branch)

2014-02-02 Thread Stack
Sorry for the delay.


On Wed, Jan 29, 2014 at 10:05 PM, Vinod Kumar Vavilapalli  wrote:

>
> My response was to your direct association of the green color to HWX green
> as if it were deliberately done. Nobody here intentionally left a vendor's
> signature like you claimed.



I did not do what you accuse me of above.  Please gp back to the original.
 All is couched in 'is it me' and 'I think'.  No accusations of deliberate
vendor insert.  That is your addition.



> And to your other comment "Does the apache binary have to be compiled by
> 'hortonmu'?   Could it be compiled by 'arun', or 'apachemu'?" to the
> message in the build. As if somebody said it has to be.
>


The Apache Hadoop version string should be pure, free of vendor pollution.
 Seems obvious to me.  I could call a vote and get it written into the
bylaws but seems a bit of a useless exercise?

(This is now a non-issue anyways having been 'fixed'.  While some chose to
do histrionics's, another committer spent a few minutes and committed a
patch so builds no longer have to be done on dev machines and can instead
come off Apache Infra and now version string has apache infra in it
instead... nice).



> You know how I'd have raised this? I'd say "Hey guys, seems like the build
> messages have hortonmu and that seems like an issue with our branding. Can
> we fix this?". Then I or somebody could have replied "Oh, that seems
> totally by mistake. Agreed, let's fix this".
>
>
Ain't this what I did give or take a bit on the wording?



> Instead, you post it in another orthogonal thread (which in itself is
> making claims of causing deliberate confusion of brand), make it look like
> an innocuous question asking if apache binary has to be compiled by the
> specific user-name.
>
>
Sorry. Seemed related to me at the time at least.  I was trying out tip of
the branch and the color made me 'sensitive' and then I tripped over the
version string (Its hard to miss being up top in our UI).


> I said 'unbelievable'. Sorry, I should have used 'disappointing'. This is
> not the way I'd post 'concerns'.
>
>
You should make up your mind.  When you waffle on your dramatic lead-in,
the 'unbelievable' becoming 'disappointing', it reads like a 'device'.
 Your reaction comes across as false, artificial, not genuine.  Just
saying...



> There is a reason why brand issues are gently discussed on private lists.
> And to think this thread is posted out in the open like this, it was me who
> was taken aback by your oh-not-so-explicit insinuations.
>

I do not apologize for thinking us as a community mature enough to answer a
basic "it looks like X to me, what do you lot think?" even if X might come
close to the bone for some of us involved here.  A simple "no, you are way
off" or "you may have a point..." and variants thereof was what I was
expecting (You did this up in the related issue, thanks for doing that, but
IMO it would have been more effective if you'd done it in this thread...).

Thanks Vinod,
St.Ack


Re: Issue with my username on my company provided dev box? (Was: …)

2014-01-30 Thread Stack
Thanks Mohammed for the suggestion though I will say you must have a bit of
a perverse streak if you consider this 'enjoyment' -- smile.

Going back to the issue of "username" in our version string, it looks like
Arun won't have to buy a new machine after all. HADOOP-10313 just got
checked in, a script to build release bits up on our shared Apache
infrastructure.

Yours,
St.Ack




On Thu, Jan 30, 2014 at 4:45 PM, Mohammad Islam  wrote:

> I was "enjoying" this discussion from the
> sideline.
>
> I strongly believe the issue could be resolved
> through in-person discussion of the related parties and move forward.
> After that meeting, a synopsis email could be sent
> if that would help and fit the bigger community.
>
> Regards,
> Mohammad
>
>
>
> On Thursday, January 30, 2014 11:32 AM, Stack  wrote:
>
> On Wed, Jan 29, 2014 at 7:31 PM, Arun C Murthy 
> wrote:
>
> >
> > Stack,
> >
> >  Apologies for the late response, I just saw this.
> >
> > On Jan 29, 2014, at 3:33 PM, Stack  wrote:
> >
> > Slightly related, I just ran into this looking back at my 2.2.0 download:
> >
> > [stack@c2020 hadoop-2.2.0]$ ./bin/hadoop version
> > Hadoop 2.2.0
> > Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
> > Compiled by hortonmu on 2013-10-07T06:28Z
> > ...
> >
> > Does the apache binary have to be compiled by 'hortonmu'?   Could it be
> > compiled by 'arun', or 'apachemu'?
> >
> > Thanks,
> > St.Ack
> >
> >
> >  Thank you for tarring all my work here with a brush by insinuating not
> > sure what for using my company provided dev machine to work on Hadoop.
> >
> >
>
> What is up Arun?  A basic query gets warped unrecognizably and given a
> dirty taint.  This in spite of us 'knowing' each other and having worked
> together for years now on this stuff.
>
> While it is true that I changed employers recently, my allegiance when
> around these parts has always been to Apache. I've had a few different
> employers during my time contributing to Hadoop (This is my 4th). I
> challenge you  to find anything in my record that has me rah rah-ing my
> current or previous employers.
>
>
>
> >  I'll try find a non-company provided dev machine to create future
> builds,
> > it might take some time because I'll have to go purchase another one. Or,
> > maybe, another option is to legally change my name.
> >
> >
> Lets chat before you go to such a radical extreme.  In another thread, it
> is implied that changing the build user would be a simple enough affair --
> but I know nothing of your infrastructure.
>
>
>
> >  Meanwhile, while we are on this topic, I just did:
> >
> >  $ git clone git://git.apache.org/hbase.git
> >  $ grep -ri cloudera *
> >
> >  Should I file a jira to fix all refs including the following imports of
> > org.cloudera.* (pasted below) … can you please help fix that? There are
> > more, but I'll leave it to your discretion. Compared to my username on my
> > company provided dev. box, this seems far more egregious. Do you agree?
> >
> >
> This is another project including a third-party project.  That seems like a
> tenuous connection to me but no problem if you would put it in the same
> bucket.  We can file an issue, or probably better as a precursor, get to a
> place where we can discuss these concerns in a civil manner and then file
> the agreed-upon issues to fix (We need this lib to add tracing to hdfs IMO
> -- so if this is in the way of its making it in, lets fix).
>
>
>
> >  In future, it might be useful to focus our efforts on moving the project
> > forward by contributing/reviewing code/docs etc., rather than on petty
> > things like usernames.
> >
> >
> This is a common theme, that the issues I raise are 'petty' or 'trifles'
> but I have trouble reconciling this assertion with the counter reaction
> raised. You react as though I were throwing molotov cocktails.
>
>
> Thanks,
> St.Ack
>


[jira] [Resolved] (HDFS-5852) Change the colors on the hdfs UI

2014-01-30 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HDFS-5852.
-

Resolution: Later

> Change the colors on the hdfs UI
> 
>
> Key: HDFS-5852
> URL: https://issues.apache.org/jira/browse/HDFS-5852
> Project: Hadoop HDFS
>  Issue Type: Bug
>    Reporter: stack
>Assignee: stack
>Priority: Blocker
>  Labels: webui
> Fix For: 2.3.0
>
> Attachments: HDFS-5852.best.txt, HDFS-5852v2.txt, 
> HDFS-5852v3-dkgreen.txt, color-rationale.png, compromise_gray.png, 
> dkgreen.png, hdfs-5852.txt, new_hdfsui_colors.png
>
>
> The HDFS UI colors are too close to HWX green.
> Here is a patch that steers clear of vendor colors.
> I made it a blocker thinking this something we'd want to fix before we 
> release apache hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Issue with my username on my company provided dev box? (Was: …)

2014-01-30 Thread Stack
On Wed, Jan 29, 2014 at 7:31 PM, Arun C Murthy  wrote:

>
> Stack,
>
>  Apologies for the late response, I just saw this.
>
> On Jan 29, 2014, at 3:33 PM, Stack  wrote:
>
> Slightly related, I just ran into this looking back at my 2.2.0 download:
>
> [stack@c2020 hadoop-2.2.0]$ ./bin/hadoop version
> Hadoop 2.2.0
> Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
> Compiled by hortonmu on 2013-10-07T06:28Z
> ...
>
> Does the apache binary have to be compiled by 'hortonmu'?   Could it be
> compiled by 'arun', or 'apachemu'?
>
> Thanks,
> St.Ack
>
>
>  Thank you for tarring all my work here with a brush by insinuating not
> sure what for using my company provided dev machine to work on Hadoop.
>
>

What is up Arun?  A basic query gets warped unrecognizably and given a
dirty taint.  This in spite of us 'knowing' each other and having worked
together for years now on this stuff.

While it is true that I changed employers recently, my allegiance when
around these parts has always been to Apache. I've had a few different
employers during my time contributing to Hadoop (This is my 4th). I
challenge you  to find anything in my record that has me rah rah-ing my
current or previous employers.



>  I'll try find a non-company provided dev machine to create future builds,
> it might take some time because I'll have to go purchase another one. Or,
> maybe, another option is to legally change my name.
>
>
Lets chat before you go to such a radical extreme.  In another thread, it
is implied that changing the build user would be a simple enough affair --
but I know nothing of your infrastructure.



>  Meanwhile, while we are on this topic, I just did:
>
>  $ git clone git://git.apache.org/hbase.git
>  $ grep -ri cloudera *
>
>  Should I file a jira to fix all refs including the following imports of
> org.cloudera.* (pasted below) … can you please help fix that? There are
> more, but I'll leave it to your discretion. Compared to my username on my
> company provided dev. box, this seems far more egregious. Do you agree?
>
>
This is another project including a third-party project.  That seems like a
tenuous connection to me but no problem if you would put it in the same
bucket.  We can file an issue, or probably better as a precursor, get to a
place where we can discuss these concerns in a civil manner and then file
the agreed-upon issues to fix (We need this lib to add tracing to hdfs IMO
-- so if this is in the way of its making it in, lets fix).



>  In future, it might be useful to focus our efforts on moving the project
> forward by contributing/reviewing code/docs etc., rather than on petty
> things like usernames.
>
>
This is a common theme, that the issues I raise are 'petty' or 'trifles'
but I have trouble reconciling this assertion with the counter reaction
raised. You react as though I were throwing molotov cocktails.

Thanks,
St.Ack


Re: Is it me or is the new bootstrap HDFS UI HWX green? (See tip of hadoop-2.3 branch)

2014-01-29 Thread Stack
On Wed, Jan 29, 2014 at 9:07 PM, Joe Bounour  wrote:

> Hello
>
> I find fascinating how all the HWX folks jumped on Stack (Not taking any
> side, I am Switzerland/french), many against one
> As a developer, it seems not a relevant topic, true but to be fair,
> Hortonwork, Cloudera claims most contributors or other fame so I can see a
> point where a little sensitivity in staying neutral as much as possible is
> not a bad idea.
>
> If the build is tagged Cloudera-foo, I can imagine the same email
> explosion with other folks.
> So true, it is not super relevant but true a little neutrality is not that
> bad
>
> My 2 cents, now Shoot me... :)
>

Thanks for the view from the outside Joe.  We are usually better behaved
that what you've seen here.
St.Ack


Re: Is it me or is the new bootstrap HDFS UI HWX green? (See tip of hadoop-2.3 branch)

2014-01-29 Thread Stack
On Wed, Jan 29, 2014 at 8:48 PM, Suresh Srinivas wrote:
>
>  > Please be more civil in your communique.  Your attack dog 'flair' has
> > likely ruined my little survey.  No one is going to comment afraid that
> > they'll get their heads cut off.
> >
>
> Right next to imploring civil communique, you add a ad hominem.



Do you seriously want to pursue this out in a public dev forum? It is
entertaining I'm sure, if you are not involved, but I for one am a little
disturbed by how this thread has gone and would like to leave it tout de
suite (For those interested, a issue was filed off this thread where the
back and forth has been more civil, and constructive, than what you see out
here: https://issues.apache.org/jira/browse/HDFS-5852).

I can call you offline if you would like to duke this out especially given
I have a different opinion on who started up the ad hominem. On your hopes
that I'll 'cleanup other projects', I don't have the stomach for it.

Thanks,
St.Ack


Re: Is it me or is the new bootstrap HDFS UI HWX green? (See tip of hadoop-2.3 branch)

2014-01-29 Thread Stack
On Wed, Jan 29, 2014 at 5:44 PM, Vinod Kumar Vavilapalli  wrote:

>
> This is unbelievable.
>

Bad opener Vinod.



> This was not deliberate for all I know.
>


Didn't think so (Didn't say it was).



> It is one of the user-names on a machine and it can be anything.
>


Good.  Then it would be easy enough purging horton references next time
around?



> So, from on, I cannot build and roll a release candidate from my laptop
> which has usernames set to Company*user-name ?
>
>
Umm.  I'm not in charge here.  I was just asking if it had to be this way?



> Can you stop digging into trifles of this kind and adding color and
> unwarranted focus? Looking at everything with a green tinted glasses will
> only let you think it is green.
>
>
Sorry.  I have another opinion to yours on the the import of the items
raised.  I was wondering if others thought as I did and so asked a few
basic questions.

I'm a little taken aback by these kind of responses.

St.Ack


Re: Is it me or is the new bootstrap HDFS UI HWX green? (See tip of hadoop-2.3 branch)

2014-01-29 Thread Stack
With all due respect Suresh, the below is awful nonsense.

I say up front what my motivation is -- avoiding vendor branding in Apache
product -- yet you would put upon me another motivation altogether:
creating 'unneeded controversy' *.  This then becomes a launch pad for a
bunch of non-sequiturs:

1. That I am out to disparage someones 'good work'.
2, Color is neutral.
3. Devs can choose any color they want in a software package.
4. Bigtop/Oozie, other Apache projects altogether, have Cloudera references
in their UI (?) so the take away is HWX can do it too, or, because I raise
an issue here -- it is only legit if I do it too in all projects under the
Apache rainbow?

Please be more civil in your communique.  Your attack dog 'flair' has
likely ruined my little survey.  No one is going to comment afraid that
they'll get their heads cut off.

Thanks,
St.Ack

* I have better things to be doing that 'controversy'.  In fact this kind
of 'controversy' makes me nauseous and question why contrib in these parts
at all. I'd trying out the tip of branch-2.3 to try and give feedback
before the lads cut an RC when I ran into the items raised here.




On Wed, Jan 29, 2014 at 5:38 PM, Suresh Srinivas wrote:

> Stack,
>
> This seems to me like coloring the good work someone has done with an
> unneeded controversy. Color is a matter of choice. The person who did the
> fine work had all the rights to choose what he sees as fit. I also think
> that while you might think "green" as vendor color, most people probably
> will not make the connection you are making or will not care either.
>
> Now what, shades of green, blue, aqua green, orange as colors that are not
> allowed for web UI? Would you recommend changing the color if another
> vendor pops up with color scheme similar to the Hadoop web UI?
>
> Have you looked at the bigtop main page and all the references to Cloudera
> Jenkins? It does not bother me and I believe it should not either. These
> are the kind of issues that in the end wastes time and energy! Have you
> looked at Oozie and are you going to open a jira to change that as well?
>
> We are better off focusing on things that are productive and important work
> related to the project than stirring up unneeded controversy.
>
> Regards,
> Suresh
>
>
> On Wed, Jan 29, 2014 at 3:33 PM, Stack  wrote:
>
> > On Wed, Jan 29, 2014 at 3:01 PM, Stack  wrote:
> >
> > > What do others think?  See here if you do not have access:
> > > http://goo.gl/j05wkf
> > >
> > > It might be a shade darker but I can't tell for sure.  It looks way too
> > > close to me.
> > >
> > > I'd think we'd intentionally go out of our way to put a vendor's
> > signature
> > > color on our Apache software.
> > >
> > > Asking here before I file a blocker in case it just a case of color
> > > blindness on my part.
> > >
> > >
> > Slightly related, I just ran into this looking back at my 2.2.0 download:
> >
> > [stack@c2020 hadoop-2.2.0]$ ./bin/hadoop version
> > Hadoop 2.2.0
> > Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
> > Compiled by hortonmu on 2013-10-07T06:28Z
> > ...
> >
> > Does the apache binary have to be compiled by 'hortonmu'?   Could it be
> > compiled by 'arun', or 'apachemu'?
> >
> > Thanks,
> > St.Ack
> >
>
>
>
> --
> http://hortonworks.com/download/
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


Re: Is it me or is the new bootstrap HDFS UI HWX green? (See tip of hadoop-2.3 branch)

2014-01-29 Thread Stack
On Wed, Jan 29, 2014 at 4:09 PM, Alejandro Abdelnur wrote:

> IMO we should't be distributing binaries. And if we do so,they should be
> built by a Jenkins job.
>

That would address the second item above.

I filed https://issues.apache.org/jira/browse/HDFS-5852 for the color issue
with a suggested alternative color scheme.

St.Ack


Re: Re-swizzle 2.3

2014-01-29 Thread Stack
I filed https://issues.apache.org/jira/browse/HDFS-5852 as a blocker.  See
what ye all think.

Thanks,
St.Ack


On Wed, Jan 29, 2014 at 3:52 PM, Aaron T. Myers  wrote:

> I just filed this JIRA as a blocker for 2.3:
> https://issues.apache.org/jira/browse/HADOOP-10310
>
> The tl;dr is that JNs will not work with security enabled without this fix.
> If others don't think that supporting QJM with security enabled warrants a
> blocker for 2.3, then we can certainly lower the priority, but it seems
> pretty important to me.
>
> Best,
> Aaron
>
> --
> Aaron T. Myers
> Software Engineer, Cloudera
>
>
> On Wed, Jan 29, 2014 at 6:24 PM, Andrew Wang  >wrote:
>
> > I just finished tuning up branch-2.3 and fixing up the HDFS and Common
> > CHANGES.txt in trunk, branch-2, and branch-2.3. I had to merge back a few
> > JIRAs committed between the swizzle and now where the fix version was 2.3
> > but weren't in branch-2.3.
> >
> > I think the only two HDFS and Common JIRAs that are marked for 2.4 are
> > these:
> >
> > HDFS-5842 Cannot create hftp filesystem when using a proxy user ugi and a
> > doAs on a secure cluster
> > HDFS-5781 Use an array to record the mapping between FSEditLogOpCode and
> > the corresponding byte value
> >
> > Jing, these both look safe to me if you want to merge them back, or I can
> > just do it.
> >
> > Thanks,
> > Andrew
> >
> > On Wed, Jan 29, 2014 at 1:21 PM, Doug Cutting 
> wrote:
> > >
> > > On Wed, Jan 29, 2014 at 12:30 PM, Jason Lowe 
> > wrote:
> > > >  It is a bit concerning that the JIRA history showed that the target
> > version
> > > > was set at some point in the past but no record of it being cleared.
> > >
> > > Perhaps the version itself was renamed?
> > >
> > > Doug
> >
>


[jira] [Created] (HDFS-5852) Change the colors on the hdfs UI

2014-01-29 Thread stack (JIRA)
stack created HDFS-5852:
---

 Summary: Change the colors on the hdfs UI
 Key: HDFS-5852
 URL: https://issues.apache.org/jira/browse/HDFS-5852
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: stack
Priority: Blocker
 Fix For: 2.3.0


The HDFS UI colors are too close to HWX green.

Here is a patch that steers clear of vendor colors.

I made it a blocker thinking this something we'd want to fix before we release 
apache hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Is it me or is the new bootstrap HDFS UI HWX green? (See tip of hadoop-2.3 branch)

2014-01-29 Thread Stack
On Wed, Jan 29, 2014 at 3:01 PM, Stack  wrote:

> What do others think?  See here if you do not have access:
> http://goo.gl/j05wkf
>
> It might be a shade darker but I can't tell for sure.  It looks way too
> close to me.
>
> I'd think we'd intentionally go out of our way to put a vendor's signature
> color on our Apache software.
>
> Asking here before I file a blocker in case it just a case of color
> blindness on my part.
>
>
Slightly related, I just ran into this looking back at my 2.2.0 download:

[stack@c2020 hadoop-2.2.0]$ ./bin/hadoop version
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
...

Does the apache binary have to be compiled by 'hortonmu'?   Could it be
compiled by 'arun', or 'apachemu'?

Thanks,
St.Ack


Is it me or is the new bootstrap HDFS UI HWX green? (See tip of hadoop-2.3 branch)

2014-01-29 Thread Stack
What do others think?  See here if you do not have access:
http://goo.gl/j05wkf

It might be a shade darker but I can't tell for sure.  It looks way too
close to me.

I'd think we'd intentionally go out of our way to put a vendor's signature
color on our Apache software.

Asking here before I file a blocker in case it just a case of color
blindness on my part.

Thanks,
St.Ack


Re: [VOTE] Release Apache Hadoop 2.1.0-beta

2013-08-21 Thread Stack
On Wed, Aug 21, 2013 at 1:25 PM, Colin McCabe wrote:

> St.Ack wrote:
>
> > + Once I figured where the logs were, found that JAVA_HOME was not being
> > exported (don't need this in hadoop-2.0.5 for instance).  Adding an
> > exported JAVA_HOME to my running shell which don't seem right but it took
> > care of it (I gave up pretty quick on messing w/
> > yarn.nodemanager.env-whitelist and yarn.nodemanager.admin-env -- I wasn't
> > getting anywhere)
>
> I thought that we were always supposed to have JAVA_HOME set when
> running any of these commands.  At least, I do.  How else can the
> system disambiguate between different Java installs?  I need 2
> installs to test with JDK7.
>
>

That is fair enough but I did not need to define this explicitly previously
(for hadoop-2.0.5-alpha for instance) or the JAVA_HOME that was figured in
start scripts was propagated and now is not (I have not dug in).



> > + This did not seem to work for me:
> > hadoop.security.group.mapping
> >
> org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback > lue>.
>
> We've seen this before.  I think your problem is that you have
> java.library.path set correctly (what System.loadLibrary checks), but
> your system library path does not include a necessary dependency of
> libhadoop.so-- most likely, libjvm.so.  Probably, we should fix
> NativeCodeLoader to actually make a function call in libhadoop.so
> before it declares everything OK.
>

My expectation was that if native group lookup fails, as it does here, then
the 'Fallback' would kick in and we'd do the Shell query.  This mechanism
does not seem to be working.


St.Ack


Re: [VOTE] Release Apache Hadoop 2.1.0-beta

2013-08-20 Thread Stack
On Thu, Aug 15, 2013 at 2:15 PM, Arun C Murthy  wrote:

> Folks,
>
> I've created a release candidate (rc2) for hadoop-2.1.0-beta that I would
> like to get released - this fixes the bugs we saw since the last go-around
> (rc1).
>
> The RC is available at:
> http://people.apache.org/~acmurthy/hadoop-2.1.0-beta-rc2/
> The RC tag in svn is here:
> http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.1.0-beta-rc2
>
> The maven artifacts are available via repository.apache.org.
>
> Please try the release and vote; the vote will run for the usual 7 days.
>

It basically works (in insecure mode), +1.

+ Checked signature.
+ Ran on small cluster w/ small load made using mapreduce interfaces.
+ Got the HBase full unit test suite to pass on top of it.

I had the following issues getting it to all work. I don't know if they are
known issues so will just list them here first.

+ I could not find documentation on how to go from tarball to running
cluster (the bundled 'cluster' and 'standalone' doc are not about how to
get this tarball off the ground).
+ I had a bit of a struggle putting this release in place under hbase unit
tests.  The container would just exit w/ 127 errcode.  No logs in expected
place.  Tripped over where minimrcluster was actually writing.  Tried to
corral it so it played nicely w/o our general test setup but found that the
new mini clusters have 'target' hardcoded as output dirs.
+ Once I figured where the logs were, found that JAVA_HOME was not being
exported (don't need this in hadoop-2.0.5 for instance).  Adding an
exported JAVA_HOME to my running shell which don't seem right but it took
care of it (I gave up pretty quick on messing w/
yarn.nodemanager.env-whitelist and yarn.nodemanager.admin-env -- I wasn't
getting anywhere)
+ This did not seem to work for me:
hadoop.security.group.mapping
org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.
 It just did this:

Caused by: java.lang.UnsatisfiedLinkError:
org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V
at org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native
Method)
at
org.apache.hadoop.security.JniBasedUnixGroupsMapping.(JniBasedUnixGroupsMapping.java:49)
at
org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.(JniBasedUnixGroupsMappingWithFallback.java:38)

..so I replaced it
w/ org.apache.hadoop.security.ShellBasedUnixGroupsMapping on the hbase-side
to get my cluster up and running.

+ Untarring the bin dir, it undoes as hadoop-X.Y.Z-beta.  Undoing the src
dir it undoes as hadoop-X.Y.Z-beta-src.  I'd have thought they would undo
into the one directory overlaying each other.

St.Ack


Re: I'm interested in working with HDFS-4680. Can somebody be a mentor?

2013-07-17 Thread Stack
Folks over at HBase would be interested in helping out.

What does a mentor have to do?  I poked around the icfoss link but didn't
see list of duties (I've been know to be certified blind on occasion).

I am not up on the malleability of hdfs RPC; is it just a matter of adding
the trace info to a pb header record or would it require more (Sanjay was
saying something recently off-list that trace id is imminent -- but I've
not done the digging)?

St.Ack


On Wed, Jul 17, 2013 at 1:44 PM, Sreejith Ramakrishnan <
sreejith.c...@gmail.com> wrote:

> Hey,
>
> I was originally researching options to work on ACCUMULO-1197. Basically,
> it was a bid to pass trace functionality through the DFSClient. I discussed
> with the guys over there on implementing a Google Dapper-style trace with
> HTrace. The guys at HBase are also trying to achieve the same HTrace
> integration [HBASE-6449]
>
> But, that meant adding stuff to the RPC in HDFS. For a start, we've to add
> a 64-bit span-id to every RPC with tracing enabled. There's some more in
> the original Dapper paper and HTrace documentation.
>
> I was told by the Accumulo people to talk with and seek help from the
> experts at HDFS. I'm open to suggestions.
>
> Additionally, I'm participating in a Joint Mentoring Programme by Apache
> which is quite similar to GSoC. Luciano Resende (Community Development,
> Apache) is incharge of the programme. I'll attach a link. The last date is
> 19th July. So, I'm pretty tensed without any mentors :(
>
> [1] https://issues.apache.org/jira/browse/ACCUMULO-1197
> [2] https://issues.apache.org/jira/browse/HDFS-4680
> [3] https://github.com/cloudera/htrace
> [4] http://community.apache.org/mentoringprogramme-icfoss-pilot.html
> [5] https://issues.apache.org/jira/browse/HBASE-6449
>
> Thank you,
> Sreejith R
>


[jira] [Created] (HDFS-4580) 0.95 site build failing with 'maven-project-info-reports-plugin: Could not find goal 'dependency-info''

2013-03-08 Thread stack (JIRA)
stack created HDFS-4580:
---

 Summary: 0.95 site build failing with 
'maven-project-info-reports-plugin: Could not find goal 'dependency-info''
 Key: HDFS-4580
 URL: https://issues.apache.org/jira/browse/HDFS-4580
 Project: Hadoop HDFS
  Issue Type: Bug
    Reporter: stack


Our report plugin is 2.4.  Says that 'dependency-info' is new since 2.5 on the 
mvn report page:


project-info-reports:dependency-info (new in 2.5>) is used to generate code 
snippets to be added to build tools.

http://maven.apache.org/plugins/maven-project-info-reports-plugin/

Let me try upgrading our reports plugin.  I tried reproducing locally running 
same mvn version but it just works for me.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: VOTE: HDFS-347 merge

2013-02-17 Thread Stack
+1


On Sun, Feb 17, 2013 at 1:48 PM, Colin McCabe wrote:

> Hi all,
>
> I would like to merge the HDFS-347 branch back to trunk.  It's been
> under intensive review and testing for several months.  The branch
> adds a lot of new unit tests, and passes Jenkins as of 2/15 [1]
>
> We have tested HDFS-347 with both random and sequential workloads. The
> short-circuit case is substantially faster [2], and overall
> performance looks very good.  This is especially encouraging given
> that the initial goal of this work was to make security compatible
> with short-circuit local reads, rather than to optimize the
> short-circuit code path.  We've also stress-tested HDFS-347 on a
> number of clusters.
>
> This iniial VOTE is to merge only into trunk.  Just as we have done
> with our other recent merges, we will consider merging into branch-2
> after the code has been in trunk for few weeks.
>
> Please cast your vote by EOD Sunday 2/24.
>
> best,
> Colin McCabe
>
> [1]
> https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13579704&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13579704
>
> [2]
> https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13551755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13551755
>


Re: Release numbering for branch-2 releases

2013-02-05 Thread Stack
On Mon, Feb 4, 2013 at 2:14 PM, Suresh Srinivas wrote:

> On Mon, Feb 4, 2013 at 1:07 PM, Owen O'Malley  wrote:
>
> > I think that using "-(alpha,beta)" tags on the release versions is a
> really
> > bad idea.
>
>
> Why? Can you please share some reasons?
>
>
We already had a means for denoting 'alpha' software -- release candidates
-- and 'beta'; early versions of a major release were installed with
trepidation by all but the clueless.

We also had a place for API changes and wire format revamps; they were done
in the next major version, not between point releases (caveat unintended
mess-ups).

The -alpha and -beta designations muddy hard-won understanding of what the
numbers mean.



> I actually think alpha and beta and stable/GA are much better way to set
> the expectation
> of the quality of a release. This has been practiced in software release
> cycle for a long time.
>

Not in hadoop though, not until these 2.0ings.



> Having an option to release alpha is good for releasing early and getting
> feedback from
> people who can try it out and at the same time warning other not so
> adventurous users on
> quality expectation.
>
>
Lets call it a snapshot instead because alpha is damaged (IMO).

Thanks Suresh,
St.Ack


Re: Release numbering for branch-2 releases

2013-02-04 Thread Stack
On Mon, Feb 4, 2013 at 10:46 AM, Arun C Murthy  wrote:

> Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a
> stable release? This way we just have one series (2.0.x) which is not
> suitable for general consumption.
>
>

That contains the versioning damage to the 2.0.x set.  This is an
improvement over the original proposal where we let the versioning mayhem
run out 2.3.

Thanks Arun,
St.Ack


Re: Release numbering for branch-2 releases

2013-02-02 Thread Stack
On Fri, Feb 1, 2013 at 3:03 AM, Tom White  wrote:

> On Wed, Jan 30, 2013 at 11:32 PM, Vinod Kumar Vavilapalli
>  wrote:
> > I still have a list of pending API/protocol cleanup in YARN that need to
> be
> > in before we even attempt supporting compatibility further down the road.
>
>
YARN requires changing HDFS/MapReduce API/wire-protocol?  Can't it be done
in hadoop 3.x?



> Just caught up with the discussion on the referred JIRAs. I can clearly
> see
> > how a single release with an umbrella alpha/beta tag is causing tensions
> > *only* because we have a single project and product. More reinforcement
> for
> > my proclivity towards separate releases and by extension towards the
> > projects' split.
>
> Good point. There's nothing to stop us doing separate releases of
> sub-project components now. Doing so might help us find
> incompatibilities between the different components in a release line
> (2.x at the moment).
>
>

I like the sound of this.  So, if HDFS, say, went unscathed by the higher
level API and wire-protocol machinations, it could make its way out to a
2.0.0 (or 2.0.4) absent the -beta/-alpha tails?

That'd help us downstreamers (As is, just trying to explain our now
out-of-date hadoop dependency is a couple of pages of the hbase reference
guide [1] -- and we haven't started in on how you'd run against hadoop2).

Thanks,
St.Ack
1. http://hbase.apache.org/book.html#hadoop


Re: Release numbering for branch-2 releases

2013-02-01 Thread Stack
On Thu, Jan 31, 2013 at 12:12 PM, Arun C Murthy  wrote:

> I apologize if there was too much technical details.
>
> The simplified version is that hadoop-2 isn't baked as it stands today,
> and is not viable to be supported by this community in a stable manner. In
> particular, it is due to the move to PB for HDFS protocols and the freshly
> minted YARN apis/protocols. As a result, we have been forced to make
> (incompatible) changes in every hadoop-2 release so far (2.0.0, 2.0.2
> etc.). Since we released the previous bits we have found security issues,
> bugs and other issues which will cause long-term maintenance harm (details
> are in the HADOOP/HDFS/YARN jiras in the original email).
>
> My aim, as the RM, is to try nudge (nay, force) all contributors to spend
> time over the next couple of months focussing on fixing known issues and to
> look for other surprises - this way I hope to ensure we do not have further
> incompatible changes for downstream projects and we can support hadoop-2
> for at least a couple of years. I hope this makes sense to you. I don't
> think turning around and calling these 3.x or 4.x makes things better since
> no amount of numbering lipstick will make the software better or viable for
> the long-term for both users and other projects. Worse, it will force HBase
> and other projects to deal with *even more* major Hadoop releases... which
> seems like a royal pita.
>
> I hope that clarifies things. Thanks Stack.
>


Tom above puts his finger on the problem I am having.  It seems that the
'hadoop versioning' is arbitrary, flaunts convention, and on top of that is
without a discernible pattern (2.0.0 is actually going to be 2.3.0?).  It
is also tantalizing as it holds out the promise of a 2.0.0 or a 2.1.0,
etc., but seemingly these will never ship.

Above you call 3.x and 4.x 'numbering liipstick' -- nice one! -- but to
this 'casual observer', IMO, it would be more calling a spade a spade; i.e.
3.x.x, a major version change, has API and possibly wire protocol changes
in it.

Thank you for taking the time to dumb it all down for me Arun,
St.Ack


Re: Release numbering for branch-2 releases

2013-01-30 Thread Stack
On Tue, Jan 29, 2013 at 12:56 PM, Arun C Murthy  wrote:

> Folks,
>
>  There has been some discussions about incompatible changes in the
> hadoop-2.x.x-alpha releases on HADOOP-9070, HADOOP-9151, HADOOP-9192 and
> few other jiras. Frankly, I'm surprised about some of them since the
> 'alpha' moniker was precisely to harden apis by changing them if necessary,
> borne out by the fact that every  single release in hadoop-2 chain has had
> incompatible changes. This happened since we were releasing early, moving
> fast and breaking things. Furthermore, we'll have more in future as move
> towards stability of hadoop-2 similar to HDFS-4362, HDFS-4364 et al in HDFS
> and YARN-142 (api changes) for YARN.
>
>  So, rather than debate more, I had a brief chat with Suresh and Todd.
> Todd suggested calling the next release as hadoop-2.1.0-alpha to indicate
> the incompatibility a little better. This makes sense to me, as long as we
> are clear that we won't make any further *feature* releases in hadoop-2.0.x
> series (obviously we might be forced to do security/bug-fix release).
>
>  Going forward, I'd like to start locking down apis/protocols for a 'beta'
> release. This way we'll have one *final* opportunity post
> hadoop-2.1.0-alpha to make incompatible changes if necessary and we can
> call it hadoop-2.2.0-beta.
>
>  Post hadoop-2.2.0-beta we *should* lock down and not allow incompatible
> changes. This will allow us to go on to a hadoop-2.3.0 as a GA release.
> This forces us to do a real effort on making sure we lock down for
> hadoop-2.2.0-beta.
>
>  In summary:
>  # I plan to now release hadoop-2.1.0-alpha (this week).
>  # We make a real effort to lock down apis/protocols and release
> hadoop-2.2.0-beta, say in March.
>  # Post 'beta' release hadoop-2.3.0 as 'stable' sometime in May.
>
>  I'll start a separate thread on 'locking protocols' w.r.t
> client-protocols v/s internal protocols (to facilitate rolling upgrades
> etc.), let's discuss this one separately.
>
>  Makes sense?



No.

I find the above opaque and written in a cryptic language that I might grok
if I spent a day or two running over cited issues trying to make some
distillation of the esotericia debated therein.  If you want feedback from
other than the cognescenti, I would suggest a better summation of what all
is involved.  I think jargon is fine for arcane technical discussion but it
seems we are talking basic hadoop versioning here and if I am following at
all, we are talking about possibly breaking API (?) and even wire protocol
inside a major version: i.e. between 2.0.x to 2.3.x say (give or take an
-alpha or -beta suffix thrown in here and there).  Does this have to be?
 Can't we do API changes and wire protocol change off in hadoop 3.x and
4.x, etc.  As is, how is a little ol' downstream project like the one I
work on supposed to cope w/ this plethora of 2.X.X-{alpha,beta,?} with no
each new 2.x possibly a whole new 'experience'?

Thanks Arun,
St.Ack


[jira] [Created] (HDFS-4239) Means of telling the datanode to stop using a sick disk

2012-11-29 Thread stack (JIRA)
stack created HDFS-4239:
---

 Summary: Means of telling the datanode to stop using a sick disk
 Key: HDFS-4239
 URL: https://issues.apache.org/jira/browse/HDFS-4239
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: stack


If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing 
occasionally, or just exhibiting high latency -- your choices are:

1. Decommission the total datanode.  If the datanode is carrying 6 or 12 disks 
of data, especially on a cluster that is smallish -- 5 to 20 nodes -- the 
rereplication of the downed datanode's data can be pretty disruptive, 
especially if the cluster is doing low latency serving: e.g. hosting an hbase 
cluster.

2. Stop the datanode, unmount the bad disk, and restart the datanode (You can't 
unmount the disk while it is in use).  This latter is better in that only the 
bad disk's data is rereplicated, not all datanode data.

Is it possible to do better, say, send the datanode a signal to tell it stop 
using a disk an operator has designated 'bad'.  This would be like option #2 
above minus the need to stop and restart the datanode.  Ideally the disk would 
become unmountable after a while.

Nice to have would be being able to tell the datanode to restart using a disk 
after its been replaced.





--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4203) After recoverFileLease, datanode gets stuck complaining block '...has out of data GS ....may already be committed'

2012-11-16 Thread stack (JIRA)
stack created HDFS-4203:
---

 Summary: After recoverFileLease, datanode gets stuck complaining 
block '...has out of data GS may already be committed'
 Key: HDFS-4203
 URL: https://issues.apache.org/jira/browse/HDFS-4203
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.1.0
Reporter: stack


After calling recoverFileLease, an append to a file gets stuck retying this:

{code}
2012-11-16 13:06:14,298 DEBUG [IPC Server handler 2 on 53224] 
namenode.PendingReplicationBlocks(92): Removing pending replication for 
blockblk_-3222397051272483489_1006
2012-11-16 13:06:43,881 WARN  [DataStreamer for file /hbase/hlog/hlog.dat.2 
block blk_-3222397051272483489_1003] hdfs.DFSClient$DFSOutputStream(3216): 
Error Recovery for block blk_-3222397051272483489_1003 bad datanode[0] 
127.0.0.1:53228
2012-11-16 13:06:43,881 WARN  [DataStreamer for file /hbase/hlog/hlog.dat.2 
block blk_-3222397051272483489_1003] hdfs.DFSClient$DFSOutputStream(3267): 
Error Recovery for block blk_-3222397051272483489_1003 in pipeline 
127.0.0.1:53228, 127.0.0.1:53231: bad datanode 127.0.0.1:53228
2012-11-16 13:06:43,884 INFO  [IPC Server handler 1 on 53233] 
datanode.DataNode(2123): Client calls 
recoverBlock(block=blk_-3222397051272483489_1003, targets=[127.0.0.1:53231])
2012-11-16 13:06:43,884 DEBUG [IPC Server handler 1 on 53233] 
datanode.FSDataset(2143): Interrupting active writer threads for block 
blk_-3222397051272483489_1006
2012-11-16 13:06:43,884 DEBUG [IPC Server handler 1 on 53233] 
datanode.FSDataset(2159): getBlockMetaDataInfo successful 
block=blk_-3222397051272483489_1006 length 120559 genstamp 1006
2012-11-16 13:06:43,884 DEBUG [IPC Server handler 1 on 53233] 
datanode.DataNode(2039): block=blk_-3222397051272483489_1003, (length=120559), 
syncList=[BlockRecord(info=BlockRecoveryInfo(block=blk_-3222397051272483489_1006
 wasRecoveredOnStartup=false) node=127.0.0.1:53231)], closeFile=false
2012-11-16 13:06:43,885 INFO  [IPC Server handler 2 on 53224] 
namenode.FSNamesystem(5468): blk_-3222397051272483489_1003 has out of date GS 
1003 found 1006, may already be committed
2012-11-16 13:06:43,885 ERROR [IPC Server handler 2 on 53224] 
security.UserGroupInformation(1139): PriviledgedActionException as:stack 
cause:java.io.IOException: blk_-3222397051272483489_1003 has out of date GS 
1003 found 1006, may already be committed
2012-11-16 13:06:43,885 ERROR [IPC Server handler 1 on 53233] 
security.UserGroupInformation(1139): PriviledgedActionException 
as:blk_-3222397051272483489_1003 cause:org.apache.hadoop.ipc.RemoteException: 
java.io.IOException: blk_-3222397051272483489_1003 has out of date GS 1003 
found 1006, may already be committed
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:5469)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:781)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

2012-11-16 13:06:43,886 WARN  [DataStreamer for file /hbase/hlog/hlog.dat.2 
block blk_-3222397051272483489_1003] hdfs.DFSClient$DFSOutputStream(3292): 
Failed recovery attempt #1 from primary datanode 127.0.0.1:53231
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.ipc.RemoteException: 
java.io.IOException: blk_-3222397051272483489_1003 has out of date GS 1003 
found 1006, may already be committed
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:5469)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:781)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
at org.apache.hadoop.ipc.Server$Handler$1.run(S

[jira] [Reopened] (HDFS-4184) Add the ability for Client to provide more hint information for DataNode to manage the OS buffer cache more accurate

2012-11-12 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HDFS-4184:
-


Here, I reopened it for you (in case you can't)

> Add the ability for Client to provide more hint information for DataNode to 
> manage the OS buffer cache more accurate
> 
>
> Key: HDFS-4184
> URL: https://issues.apache.org/jira/browse/HDFS-4184
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: binlijin
>
> HDFS now has the ability to use posix_fadvise and sync_data_range syscalls to 
> manage the OS buffer cache.
> {code}
> When hbase read hlog the data we can set dfs.datanode.drop.cache.behind.reads 
> to true to drop data out of the buffer cache when performing sequential reads.
> When hbase write hlog we can set dfs.datanode.drop.cache.behind.writes to 
> true to drop data out of the buffer cache after writing
> When hbase read hfile during compaction we can set 
> dfs.datanode.readahead.bytes to a non-zero value to trigger readahead for 
> sequential reads.
> and so on... 
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4184) Add new interface for Client to provide more information

2012-11-12 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HDFS-4184.
-

Resolution: Invalid

Resolving invalid as not enough detail.

The JIRA subject and description do not seem to match.  As per Ted in previous 
issue, please add more detail when you create issue so we can know better to 
what you refer.  Meantime I'm closing this.  Open a new one when better 
specification (this seems to require a particular version of hadoop, etc.).

Thanks Binlijin.

> Add new interface for Client to provide more information
> 
>
> Key: HDFS-4184
> URL: https://issues.apache.org/jira/browse/HDFS-4184
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: binlijin
>
> When hbase read or write hlog we can use 
> dfs.datanode.drop.cache.behind.reads、dfs.datanode.drop.cache.behind.writes, 
> when hbase read hfile during compaction we can use readahead and so on... 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Merge HDFS-3077 (QuorumJournalManager) branch to trunk

2012-09-27 Thread Stack
On Thu, Sep 27, 2012 at 2:06 AM, Konstantin Shvachko
 wrote:
> The SPOF is in HDFS. This project is about shared storage
> implementation, that could be replaced by NFS or BookKeeper or
> something else.

You cannot equate QJM to a solution that requires an NFS filer.  A
filer is just not possible in the deploys I am privy to.

> Same if QJ failed. You go to the creators, which will be somewhat
> closer than NetApp, because it is still Hadoop.
>

You seem to be undoing yourself with the above setup Konstantin.  At
our deploy, we can't do NetApp so calling them will never happen.  If
a problem in QJM, it's Apache HDFS, so it'll be fixed by the community
-- hopefully w/ input by creators -- as any other issue would be fixed
(or not).

St.Ack


Re: [VOTE] Merge HDFS-3077 (QuorumJournalManager) branch to trunk

2012-09-27 Thread Stack
On Wed, Sep 26, 2012 at 4:21 PM, Konstantin Shvachko
 wrote:
> Don't understand your argument. Else where?

You suggest users should download HDFS and then go to another project
(or subproject) -- i.e. 'elsewhere' -- to get a fundamental, a fix for
the SPOF.  IMO, the SPOF-fix belongs in HDFS core.

St.Ack


Re: [VOTE] Merge HDFS-3077 (QuorumJournalManager) branch to trunk

2012-09-26 Thread Stack
On Tue, Sep 25, 2012 at 11:21 PM, Konstantin Shvachko
 wrote:
> I think this is a great work, Todd.
> And I think we should not merge it into trunk or other branches.
> As I suggested earlier on this list I think this should be spinned off
> as a separate project or a subproject.
>

I'd be -1 on that.

Users shouldn't have to go elsewhere to get a fix for SPOF.

St.Ack


[jira] [Created] (HDFS-2408) DFSClient#getNumCurrentReplicas is package private in 205 but public in branch-0.20-append

2011-10-05 Thread stack (Created) (JIRA)
DFSClient#getNumCurrentReplicas is package private in 205 but public in 
branch-0.20-append
--

 Key: HDFS-2408
 URL: https://issues.apache.org/jira/browse/HDFS-2408
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.20.205.1
Reporter: stack
Assignee: stack


The below commit broke hdfs-826 for hbase in 205 rc1.  It changes the 
accessiblity from public to package private on getNumCurrentReplicas and now 
current shipping hbase's at least cannot get at this method.

{code}
Revision 1174483 - (view) (download) (annotate) - [select for diffs] 
Modified Fri Sep 23 01:30:18 2011 UTC (13 days, 4 hours ago) by szetszwo 
File length: 136876 byte(s) 
Diff to previous 1174479 (colored)
svn merge -c 1171137 from branch-0.20-security for HDFS-2333.
{code}

Here is diff between above change and one just previous:

http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security-205/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java?view=diff&r1=1174479&r2=1174483&diff_format=u

This is a critical facility for us.

It seems like making this one method public again is all thats needed.  I can 
make a patch like the below:

diff --git a/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java 
b/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java
index b9cb053..39955c9 100644
--- a/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java
+++ b/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java
@@ -3569,7 +3569,7 @@ public class DFSClient implements FSConstants, 
java.io.Closeable {
  * block is not yet allocated, then this API will return 0 because there 
are
  * no replicas in the pipeline.
  */
-int getNumCurrentReplicas() throws IOException {
+public int getNumCurrentReplicas() throws IOException {
   synchronized(dataQueue) {
 if (nodes == null) {
   return blockReplication;

Can we get this into RC2?

Thanks,
St.Ack

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2296) If read error while lease is being recovered, client reverts to stale view on block info

2011-08-29 Thread stack (JIRA)
If read error while lease is being recovered, client reverts to stale view on 
block info


 Key: HDFS-2296
 URL: https://issues.apache.org/jira/browse/HDFS-2296
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.20-append, 0.22.0, 0.23.0
Reporter: stack
Priority: Critical


We are seeing the following issue around recoverLease over in hbaselandia.  
DFSClient calls recoverLease to assume ownership of a file.  The recoverLease 
returns to the client but it can take time for the new state to propagate.  
Meantime, an incoming read fails though its using updated block info.  
Thereafter all read retries fail because on exception we revert to stale block 
view and we never recover.  Laxman reports this issue in the below mailing 
thread:

See this thread for first report of this issue: 
http://search-hadoop.com/m/S1mOHFRmgk2/%2527FW%253A+Handling+read+failures+during+recovery%2527&subj=FW+Handling+read+failures+during+recovery

Chatting w/ Hairong offline, she suggests this a general issue around lease 
recovery no matter how it triggered (new recoverLease or not).

I marked this critical.  At least over in hbase it is since we get set stuck 
here recovering a crashed server.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-1948) Forward port 'hdfs-1520 lightweight namenode operation to trigger lease reccovery'

2011-05-16 Thread stack (JIRA)
Forward port 'hdfs-1520 lightweight namenode operation to trigger lease 
reccovery'
--

 Key: HDFS-1948
 URL: https://issues.apache.org/jira/browse/HDFS-1948
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: stack


This issue is about forward porting from branch-0.20-append the little namenode 
api that facilitates stealing of a file's lease.  The forward port would be an 
adaption of hdfs-1520 and its companion patches, hdfs-1555 and hdfs-1554, to 
suit the TRUNK.

Intent is to get this fix into 0.22 time willing; i'll run a vote to get ok on 
getting it added to branch.  HBase needs this facility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Please join me in welcoming the following people as committers to the Hadoop project

2011-01-05 Thread Stack
Congrats lads.
St.Ack

On Wed, Jan 5, 2011 at 7:40 PM, Ian Holsman  wrote:
> On behalf of the Apache Hadoop PMC, I would like to extend a warm welcome to 
> the following people,
> who have all chosen to accept the role of committers on Hadoop.
>
> In no alphabetical order:
>
> - Aaron Kimball
> - Allen Wittenauer
> - Amar Kamat
> - Dmytro Molkov
> - Jitendra Pandey
> - Kan Zhang
> - Ravi Gummadi
> - Sreekanth Ramakrishna
> - Todd Lipcon
>
> I appreciate all the hard work these people have put into the project so far, 
> and look forward to future contributions they will make to Hadoop in the 
> future
>
> Well done guys!
>
>
> --Ian
>
>


[MOTION PASSED, VOTE CLOSED] WAS -> Re: [VOTE] Commit hdfs-1024 to 0.20 branch

2010-04-05 Thread Stack
Thanks to all who participated in the vote.

I'll commit in a minute.

St.Ack


On Fri, Apr 2, 2010 at 10:38 AM, Stack  wrote:
> Please on committing HDFS-1024 to the hadoop 0.20 branch.
>
> Background:
>
> HDFS-1024 fixes possible trashing of fsimage because of failed copy
> from 2NN and NN.  Ordinarily, possible corruption of this proportion
> would merit commit w/o need of a vote only Dhruba correctly notes that
> UNLESS both NN and 2NN are upgraded, HDFS-1024 becomes an incompatible
> change (the NN<->2NN communication will fail always).  IMO, this
> incompatible change can be plastered over with a release note; e.g.
> WARNING, you MUST update NN and 2NN when you go to 0.20.3 hadoop.  If
> you agree with me, please vote +1 on commit.
>
> Thanks,
> St.Ack
>


[VOTE] Commit hdfs-1024 to 0.20 branch

2010-04-02 Thread Stack
Please on committing HDFS-1024 to the hadoop 0.20 branch.

Background:

HDFS-1024 fixes possible trashing of fsimage because of failed copy
from 2NN and NN.  Ordinarily, possible corruption of this proportion
would merit commit w/o need of a vote only Dhruba correctly notes that
UNLESS both NN and 2NN are upgraded, HDFS-1024 becomes an incompatible
change (the NN<->2NN communication will fail always).  IMO, this
incompatible change can be plastered over with a release note; e.g.
WARNING, you MUST update NN and 2NN when you go to 0.20.3 hadoop.  If
you agree with me, please vote +1 on commit.

Thanks,
St.Ack


Re: [VOTE] Commit HDFS-927 to both 0.20 and 0.21 branch?

2010-02-08 Thread Stack
Vote is closed (unless there is objection).  I'll commit below in next
day or so.
Thanks to all who participated.
St.Ack

On Mon, Feb 8, 2010 at 11:26 AM, Todd Lipcon  wrote:
> Given people have had several days to vote, and there have been no
> -1s, this should be good to go in, right? We have two HDFS committer
> +1s (Stack and Nicholas) and nonbinding +1s from several others.
>
> Thanks
> -Todd
>
> On Thu, Feb 4, 2010 at 1:30 PM, Tsz Wo (Nicholas), Sze
>  wrote:
>>
>> This is a friendly reminder for voting on committing HDFD-927 to 0.20 and 
>> 0.21.
>>
>> Comiitters, please vote!
>>
>> Nicholas
>>
>>
>>
>>
>> - Original Message 
>> > From: Stack 
>> > To: hdfs-dev@hadoop.apache.org
>> > Sent: Tue, February 2, 2010 10:22:50 PM
>> > Subject: [VOTE] Commit HDFS-927 to both 0.20 and 0.21 branch?
>> >
>> > I'd like to open a vote on committing HDFS-927 to both hadoop branch
>> > 0.20 and to 0.21.
>> >
>> > HDFS-927 "DFSInputStream retries too many times for new block
>> > location" has an odd summary but in short, its a better HDFS-127
>> > "DFSClient block read failures cause open DFSInputStream to become
>> > unusable".  HDFS-127 is an old, popular issue that refuses to die.  We
>> > voted on having it committed to the 0.20 branch not too long ago, see
>> > http://www.mail-archive.com/hdfs-dev@hadoop.apache.org/msg00401.html,
>> > only it broke TestFsck (See http://su.pr/1nylUn) so it was reverted.
>> >
>> > High-level, HDFS-127/HDFS-927 is about fixing DFSClient so it a good
>> > read cleans out the failures count (Previous failures 'stuck' though
>> > there may have been hours of successful reads in betwixt).  When
>> > rolling hadoop 0.20.2 was proposed, a few fellas including myself
>> > raised a lack of HDFS-127 as an obstacle.
>> >
>> > HDFS-927 has been committed to TRUNK.
>> >
>> > I'm +1 on committing to 0.20 and to 0.21 branches.
>> >
>> > Thanks for taking the time to take a look into this issue.
>> > St.Ack
>>
>>
>


[VOTE] Commit HDFS-927 to both 0.20 and 0.21 branch?

2010-02-02 Thread Stack
I'd like to open a vote on committing HDFS-927 to both hadoop branch
0.20 and to 0.21.

HDFS-927 "DFSInputStream retries too many times for new block
location" has an odd summary but in short, its a better HDFS-127
"DFSClient block read failures cause open DFSInputStream to become
unusable".  HDFS-127 is an old, popular issue that refuses to die.  We
voted on having it committed to the 0.20 branch not too long ago, see
http://www.mail-archive.com/hdfs-dev@hadoop.apache.org/msg00401.html,
only it broke TestFsck (See http://su.pr/1nylUn) so it was reverted.

High-level, HDFS-127/HDFS-927 is about fixing DFSClient so it a good
read cleans out the failures count (Previous failures 'stuck' though
there may have been hours of successful reads in betwixt).  When
rolling hadoop 0.20.2 was proposed, a few fellas including myself
raised a lack of HDFS-127 as an obstacle.

HDFS-927 has been committed to TRUNK.

I'm +1 on committing to 0.20 and to 0.21 branches.

Thanks for taking the time to take a look into this issue.
St.Ack


[VOTE CLOSED] -> WAS -> Re: [VOTE -- Round 2] Commit hdfs-630 to 0.21?

2010-01-25 Thread Stack
Thanks all for voting (and discussing).  The ayes have it.  I'll go
commit HDFS-630 to 0.21.
St.Ack

On Mon, Jan 25, 2010 at 5:36 AM, Steve Loughran  wrote:
> Cosmin Lehene wrote:
>>
>> Steve,
>> A DoS could not be done using excludedNodes.
>>
>> The blacklisting takes place only at DFSClientLevel. The NN will return a
>> list of block locations that excludes the nodes the client decided. This
>> list isn't persisted anywhere on the server. So if a client excludes the
>> entire set of DNs other clients won't be affected.
>>
>
> OK,  +1 then.
>


[VOTE -- Round 2] Commit hdfs-630 to 0.21?

2010-01-21 Thread Stack
I'd like to propose a new vote on having hdfs-630 committed to 0.21.
The first vote on this topic, initiated 12/14/2009, was sunk by Tsz Wo
(Nicholas), Sze suggested improvements. Those suggestions have since
been folded into a new version of the hdfs-630 patch.  Its this new
version of the patch -- 0001-Fix-HDFS-630-0.21-svn-2.patch -- that I'd
like us to vote on. For background on why we -- the hbase community
-- think hdfs-630 important, see the notes below from the original
call-to-vote.

I'm obviously +1.

Thanks for you consideration,
St.Ack

P.S. Regards TRUNK, after chatting with Nicholas, TRUNK was cleaned of
the previous versions of hdfs-630 and we'll likely apply
0001-Fix-HDFS-630-trunk-svn-4.patch, a version of
0001-Fix-HDFS-630-0.21-svn-2.patch that works for TRUNK that includes
the Nicholas suggestions.


On Mon, Dec 14, 2009 at 9:56 PM, stack  wrote:
> I'd like to propose a vote on having hdfs-630 committed to 0.21 (Its already
> been committed to TRUNK).
>
> hdfs-630 adds having the dfsclient pass the namenode the name of datanodes
> its determined dead because it got a failed connection when it tried to
> contact it, etc.  This is useful in the interval between datanode dying and
> namenode timing out its lease.  Without this fix, the namenode can often
> give out the dead datanode as a host for a block.  If the cluster is small,
> less than 5 or 6 nodes, then its very likely namenode will give out the dead
> datanode as a block host.
>
> Small clusters are common in hbase, especially when folks are starting out
> or evaluating hbase.  They'll start with three or four nodes carrying both
> datanodes+hbase regionservers.  They'll experiment killing one of the slaves
> -- datanodes and regionserver -- and watch what happens.  What follows is a
> struggling dfsclient trying to create replicas where one of the datanodes
> passed us by the namenode is dead.   DFSClient will fail and then go back to
> the namenode again, etc. (See
> https://issues.apache.org/jira/browse/HBASE-1876 for more detailed
> blow-by-blow).  HBase operation will be held up during this time and
> eventually a regionserver will shut itself down to protect itself against
> dataloss if we can't successfully write HDFS.
>
> Thanks all,
> St.Ack


[jira] Reopened: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.

2009-12-21 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HDFS-630:



Reopening so can submit improved patch.

> In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific 
> datanodes when locating the next block.
> ---
>
> Key: HDFS-630
> URL: https://issues.apache.org/jira/browse/HDFS-630
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs client
>Affects Versions: 0.21.0
>Reporter: Ruyue Ma
>Assignee: Cosmin Lehene
>Priority: Minor
> Attachments: 0001-Fix-HDFS-630-0.21-svn.patch, 
> 0001-Fix-HDFS-630-for-0.21-and-trunk-unified.patch, 
> 0001-Fix-HDFS-630-for-0.21.patch, 0001-Fix-HDFS-630-svn.patch, 
> 0001-Fix-HDFS-630-svn.patch, 0001-Fix-HDFS-630-trunk-svn-1.patch, 
> 0001-Fix-HDFS-630-trunk-svn-2.patch, HDFS-630.patch
>
>
> created from hdfs-200.
> If during a write, the dfsclient sees that a block replica location for a 
> newly allocated block is not-connectable, it re-requests the NN to get a 
> fresh set of replica locations of the block. It tries this 
> dfs.client.block.write.retries times (default 3), sleeping 6 seconds between 
> each retry ( see DFSClient.nextBlockOutputStream).
> This setting works well when you have a reasonable size cluster; if u have 
> few datanodes in the cluster, every retry maybe pick the dead-datanode and 
> the above logic bails out.
> Our solution: when getting block location from namenode, we give nn the 
> excluded datanodes. The list of dead datanodes is only for one block 
> allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [VOTE CANCELLED] Commit hdfs-630 to 0.21?

2009-12-20 Thread stack
Nicholas reviewed hdfs-630 patch and made some suggestions for improvements.
 Cosmin, the patch writer, obliged.  After chatting with Nicholas and
Cosmin, I will reverse the hdfs-630 patch that is in TRUNK and if the new
patch passes hudson, will apply it instead.  I will then put up a new vote
to have the improved patch applied to 0.21.

Thanks to all who voted.
St.Ack


On Mon, Dec 14, 2009 at 9:56 PM, stack  wrote:

> I'd like to propose a vote on having hdfs-630 committed to 0.21 (Its
> already been committed to TRUNK).
>
> hdfs-630 adds having the dfsclient pass the namenode the name of datanodes
> its determined dead because it got a failed connection when it tried to
> contact it, etc.  This is useful in the interval between datanode dying and
> namenode timing out its lease.  Without this fix, the namenode can often
> give out the dead datanode as a host for a block.  If the cluster is small,
> less than 5 or 6 nodes, then its very likely namenode will give out the dead
> datanode as a block host.
>
> Small clusters are common in hbase, especially when folks are starting out
> or evaluating hbase.  They'll start with three or four nodes carrying both
> datanodes+hbase regionservers.  They'll experiment killing one of the slaves
> -- datanodes and regionserver -- and watch what happens.  What follows is a
> struggling dfsclient trying to create replicas where one of the datanodes
> passed us by the namenode is dead.   DFSClient will fail and then go back to
> the namenode again, etc. (See
> https://issues.apache.org/jira/browse/HBASE-1876 for more detailed
> blow-by-blow).  HBase operation will be held up during this time and
> eventually a regionserver will shut itself down to protect itself against
> dataloss if we can't successfully write HDFS.
>
> Thanks all,
> St.Ack


[VOTE] Commit hdfs-630 to 0.21?

2009-12-14 Thread stack
I'd like to propose a vote on having hdfs-630 committed to 0.21 (Its already
been committed to TRUNK).

hdfs-630 adds having the dfsclient pass the namenode the name of datanodes
its determined dead because it got a failed connection when it tried to
contact it, etc.  This is useful in the interval between datanode dying and
namenode timing out its lease.  Without this fix, the namenode can often
give out the dead datanode as a host for a block.  If the cluster is small,
less than 5 or 6 nodes, then its very likely namenode will give out the dead
datanode as a block host.

Small clusters are common in hbase, especially when folks are starting out
or evaluating hbase.  They'll start with three or four nodes carrying both
datanodes+hbase regionservers.  They'll experiment killing one of the slaves
-- datanodes and regionserver -- and watch what happens.  What follows is a
struggling dfsclient trying to create replicas where one of the datanodes
passed us by the namenode is dead.   DFSClient will fail and then go back to
the namenode again, etc. (See
https://issues.apache.org/jira/browse/HBASE-1876 for more detailed
blow-by-blow).  HBase operation will be held up during this time and
eventually a regionserver will shut itself down to protect itself against
dataloss if we can't successfully write HDFS.

Thanks all,
St.Ack


Commit hdfs-630 to 0.21?

2009-12-12 Thread stack
HDFS-630 is kinda critical to us over in hbase.  We'd like to get it into
0.21 (Its been committed to TRUNK).  Its probably hard to argue its a
blocker for 0.21.  We could run a vote.  Or should we just file it against
0.21.1 hdfs and commit it after 0.21 goes out?  What would folks suggest?

Without it, a node crash (datanode+regionserver) will bring down a second
regionserver, particularly if the cluster is small (See HBASE-1876 for
description of the play-by-play where NN keeps giving out dead DN as place
to locate new blocks).  Since the bulk of hbase clusters are small --
whether evaluations, test, or just small productions -- this issue is an
important fix for us.  If the cluster is of 5 or less nodes, we'll probably
recover but there'll be a period of churn.  At a minimum mapreduce jobs
running against the cluster will fail (usually some kind of bullk upload).

St.Ack


[jira] Resolved: (HDFS-720) NPE in BlockReceiver$PacketResponder.run(BlockReceiver.java:923)

2009-10-23 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HDFS-720.


   Resolution: Fixed
Fix Version/s: 0.21.0

Resolving as fixed by HDFS-690.  I just ran my tests with hdfs-690 in place and 
I no longer see NPEs.  Thanks.

> NPE in BlockReceiver$PacketResponder.run(BlockReceiver.java:923)
> 
>
> Key: HDFS-720
> URL: https://issues.apache.org/jira/browse/HDFS-720
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.21.0
> Environment: Current branch-0.21 of hdfs, mapreduce, and common.  
> Here is svn info:
> URL: https://svn.apache.org/repos/asf/hadoop/hdfs/branches/branch-0.21
> Repository Root: https://svn.apache.org/repos/asf
> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
> Revision: 827883
> Node Kind: directory
> Schedule: normal
> Last Changed Author: szetszwo
> Last Changed Rev: 826906
> Last Changed Date: 2009-10-20 00:16:25 + (Tue, 20 Oct 2009)
>Reporter: stack
> Fix For: 0.21.0
>
> Attachments: dn.log
>
>
> Running some loadings on hdfs I had one of these on the DN XX.XX.XX.139:51010:
> {code}
> 2009-10-21 04:57:02,755 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving block blk_6345892463926159834_1029 src: /XX,XX,XX.140:37890 dest: 
> /XX.XX.XX.139:51010
> 2009-10-21 04:57:02,829 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> PacketResponder blk_6345892463926159834_1029 1 Exception 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:923)
> at java.lang.Thread.run(Thread.java:619)
> {code}
> On XX,XX,XX.140 side, it looks like this:
> {code}
> 10-21 04:57:01,866 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving block blk_6345892463926159834_1029 src: /XX.XX.XX.140:37385 dest: 
> /XX.XX.XX140:51010
> 2009-10-21 04:57:02,836 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> PacketResponder 2 for block blk_6345892463926159834_1029 terminating
> 2009-10-21 04:57:02,885 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(XX.XX.XX.140:51010, 
> storageID=DS-1292310101-208.76.44.140-51010-1256100924816, infoPort=51075, 
> ipcPort=51020):Exception writing block blk_6345892463926159834_1029 to mirror 
> XX.XX.XX.139:51010
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcher.write0(Native Method)
> at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
> at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
> at sun.nio.ch.IOUtil.write(IOUtil.java:75)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
> at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
> at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:466)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:434)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:573)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:352)
> at 
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:382)
> at 
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:323)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:111)
> at java.lang.Thread.run(Thread.java:619)
> {code}
> Here is the bit of code inside the run method:
> {code}
>  922   pkt = ackQueue.getFirst();
>  923   expected = pkt.seqno;
> {code}
> So 'pkt' is null?  But LinkedList API says that it throws 
> NoSuchElementException if list is empty so you'd think we wouldn't get a NPE 
> here.  What am I missing?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-721) ERROR Block blk_XXX_1030 already exists in state RBW and thus cannot be created

2009-10-21 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HDFS-721.


Resolution: Invalid

Working as designed.  Closing.

> ERROR Block blk_XXX_1030 already exists in state RBW and thus cannot be 
> created
> ---
>
> Key: HDFS-721
> URL: https://issues.apache.org/jira/browse/HDFS-721
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.21.0
> Environment: dfs.support.append=true
> Current branch-0.21 of hdfs, mapreduce, and common. Here is svn info:
> URL: https://svn.apache.org/repos/asf/hadoop/hdfs/branches/branch-0.21
> Repository Root: https://svn.apache.org/repos/asf
> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
> Revision: 827883
> Node Kind: directory
> Schedule: normal
> Last Changed Author: szetszwo
> Last Changed Rev: 826906
> Last Changed Date: 2009-10-20 00:16:25 + (Tue, 20 Oct 2009)
>Reporter: stack
>
> Running some loading tests against hdfs branch-0.21 I got the following:
> {code}
> 2009-10-21 04:57:10,770 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving block blk_6345892463926159834_1030 src: /XX.XX.XX.141:53112 dest: 
> /XX.XX.XX.140:51010
> 2009-10-21 04:57:10,771 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> writeBlock blk_6345892463926159834_1030 received exception 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> blk_6345892463926159834_1030 already exists in state RBW and thus cannot be 
> created.
> 2009-10-21 04:57:10,771 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(XX.XX.XX.140:51010, 
> storageID=DS-1292310101-XX.XX.XX.140-51010-1256100924816, infoPort=51075, 
> ipcPort=51020):DataXceiver
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> blk_6345892463926159834_1030 already exists in state RBW and thus cannot be 
> created.
> at 
> org.apache.hadoop.hdfs.server.datanode.FSDataset.createTemporary(FSDataset.java:1324)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:98)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:258)
> at 
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:382)
> at 
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:323)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:111)
> at java.lang.Thread.run(Thread.java:619)
> {code}
> On the sender side:
> {code}
> 2009-10-21 04:57:10,740 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(XX.XX.XX.141:51010, 
> storageID=DS-1870884070-XX.XX.XX.141-51010-1256100925196, infoPort=51075, 
> ipcPort=51020) Starting thread to transfer block blk_6345892463926159834_1030 
> to XX.XX.XX.140:51010
> 2009-10-21 04:57:10,770 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(XX.XX.XX.141:51010, 
> storageID=DS-1870884070-XX.XX.XX.141-51010-1256100925196, infoPort=51075, 
> ipcPort=51020):Failed to transfer blk_6345892463926159834_1030 to 
> XX.XX.XX.140:51010 got java.net.SocketException: Original Exception : 
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
> at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:415)
> at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:516)
> at 
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:199)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:346)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:434)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1262)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.io.IOException: Connection reset by peer
> ... 8 more
> {code}
> The block sequence number, 1030, is one more than that in issue HDFS-720 
> (same test run but about 8 seconds between errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-721) ERROR Block blk_XXX_1030 already exists in state RBW and thus cannot be created

2009-10-20 Thread stack (JIRA)
ERROR Block blk_XXX_1030 already exists in state RBW and thus cannot be created
---

 Key: HDFS-721
 URL: https://issues.apache.org/jira/browse/HDFS-721
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.21.0
 Environment: dfs.support.append=true

Current branch-0.21 of hdfs, mapreduce, and common. Here is svn info:

URL: https://svn.apache.org/repos/asf/hadoop/hdfs/branches/branch-0.21
Repository Root: https://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 827883
Node Kind: directory
Schedule: normal
Last Changed Author: szetszwo
Last Changed Rev: 826906
Last Changed Date: 2009-10-20 00:16:25 + (Tue, 20 Oct 2009)
Reporter: stack


Running some loading tests against hdfs branch-0.21 I got the following:

{code}
2009-10-21 04:57:10,770 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving block blk_6345892463926159834_1030 src: /XX.XX.XX.141:53112 dest: 
/XX.XX.XX.140:51010
2009-10-21 04:57:10,771 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
writeBlock blk_6345892463926159834_1030 received exception 
org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
blk_6345892463926159834_1030 already exists in state RBW and thus cannot be 
created.
2009-10-21 04:57:10,771 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(XX.XX.XX.140:51010, 
storageID=DS-1292310101-XX.XX.XX.140-51010-1256100924816, infoPort=51075, 
ipcPort=51020):DataXceiver
org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
blk_6345892463926159834_1030 already exists in state RBW and thus cannot be 
created.
at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.createTemporary(FSDataset.java:1324)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:98)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:258)
at 
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:382)
at 
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:323)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:111)
at java.lang.Thread.run(Thread.java:619)
{code}

On the sender side:

{code}
2009-10-21 04:57:10,740 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(XX.XX.XX.141:51010, 
storageID=DS-1870884070-XX.XX.XX.141-51010-1256100925196, infoPort=51075, 
ipcPort=51020) Starting thread to transfer block blk_6345892463926159834_1030 
to XX.XX.XX.140:51010
2009-10-21 04:57:10,770 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(XX.XX.XX.141:51010, 
storageID=DS-1870884070-XX.XX.XX.141-51010-1256100925196, infoPort=51075, 
ipcPort=51020):Failed to transfer blk_6345892463926159834_1030 to 
XX.XX.XX.140:51010 got java.net.SocketException: Original Exception : 
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:415)
at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:516)
at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:199)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:346)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:434)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1262)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Connection reset by peer
... 8 more
{code}

The block sequence number, 1030, is one more than that in issue HDFS-720 (same 
test run but about 8 seconds between errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-720) NPE in BlockReceiver$PacketResponder.run(BlockReceiver.java:923)

2009-10-20 Thread stack (JIRA)
NPE in BlockReceiver$PacketResponder.run(BlockReceiver.java:923)


 Key: HDFS-720
 URL: https://issues.apache.org/jira/browse/HDFS-720
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.21.0
 Environment: Current branch-0.21 of hdfs, mapreduce, and common.  Here 
is svn info:

URL: https://svn.apache.org/repos/asf/hadoop/hdfs/branches/branch-0.21
Repository Root: https://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 827883
Node Kind: directory
Schedule: normal
Last Changed Author: szetszwo
Last Changed Rev: 826906
Last Changed Date: 2009-10-20 00:16:25 + (Tue, 20 Oct 2009)
Reporter: stack


Running some loadings on hdfs I had one of these on the DN XX.XX.XX.139:51010:

{code}
2009-10-21 04:57:02,755 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving block blk_6345892463926159834_1029 src: /XX,XX,XX.140:37890 dest: 
/XX.XX.XX.139:51010
2009-10-21 04:57:02,829 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder blk_6345892463926159834_1029 1 Exception 
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:923)
at java.lang.Thread.run(Thread.java:619)
{code}

On XX,XX,XX.140 side, it looks like this:

{code}
10-21 04:57:01,866 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving block blk_6345892463926159834_1029 src: /XX.XX.XX.140:37385 dest: 
/XX.XX.XX140:51010
2009-10-21 04:57:02,836 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder 2 for block blk_6345892463926159834_1029 terminating
2009-10-21 04:57:02,885 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(XX.XX.XX.140:51010, 
storageID=DS-1292310101-208.76.44.140-51010-1256100924816, infoPort=51075, 
ipcPort=51020):Exception writing block blk_6345892463926159834_1029 to mirror 
XX.XX.XX.139:51010
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
at sun.nio.ch.IOUtil.write(IOUtil.java:75)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
at 
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:466)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:434)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:573)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:352)
at 
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:382)
at 
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:323)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:111)
at java.lang.Thread.run(Thread.java:619)
{code}

Here is the bit of code inside the run method:
{code}
 922   pkt = ackQueue.getFirst();
 923   expected = pkt.seqno;
{code}

So 'pkt' is null?  But LinkedList API says that it throws 
NoSuchElementException if list is empty so you'd think we wouldn't get a NPE 
here.  What am I missing?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [VOTE] port HDFS-127 (DFSClient block read failures cause open DFSInputStream to become unusable) to hadoop 0.20/0.21

2009-10-20 Thread stack
+1

On Mon, Oct 19, 2009 at 2:34 PM, Tsz Wo (Nicholas), Sze <
s29752-hadoop...@yahoo.com> wrote:

> DFSClient has a retry mechanism on block acquiring for read.  If the number
> of retries attends to a certain limit (defined by
> dfs.client.max.block.acquire.failures), DFSClient will throw a
> BlockMissingException back to the user application.  In the current
> implementation, DFSClient counts the failures across multiple block
> acquiring operations but the block acquiring operations are supposed to be
> independent.  HDFS-127 fixes this problem by counting the failures within a
> single operation.
>
> I propose to commit HDFS-127 to 0.20 and above since this fix is safe and
> very useful.
>
> Nicholas Sze
>
>