from:"Stack"

[jira] [Resolved] (HDFS-16684) Exclude self from JournalNodeSyncer when using a bind host

2022-08-28 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HDFS-16684.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

Merged to trunk and branch-3.3. Resolving. Thanks for the nice contribution 
[~svaughan] 

> Exclude self from JournalNodeSyncer when using a bind host
> --
>
> Key: HDFS-16684
> URL: https://issues.apache.org/jira/browse/HDFS-16684
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node
>Affects Versions: 3.4.0, 3.3.9
> Environment: Running with Java 11 and bind addresses set to 0.0.0.0.
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> The JournalNodeSyncer will include the local instance in syncing when using a 
> bind host (e.g. 0.0.0.0).  There is a mechanism that is supposed to exclude 
> the local instance, but it doesn't recognize the meta-address as a local 
> address.
> Running with bind addresses set to 0.0.0.0, the JournalNodeSyncer will log 
> attempts to sync with itself as part of the normal syncing rotation.  For an 
> HA configuration running 3 JournalNodes, the "other" list used by the 
> JournalNodeSyncer will include 3 proxies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.3.4

2022-08-04 Thread Stack

+1 (Sorry, took me a while)

Ran: ./dev-support/hadoop-vote.sh --source
https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/

* Signature: ok

* Checksum : failed

* Rat check (17.0.1): ok

 - mvn clean apache-rat:check

* Built from source (17.0.1): ok

 - mvn clean install  -DskipTests

* Built tar from source (17.0.1): ok

 - mvn clean package  -Pdist -DskipTests -Dtar
-Dmaven.javadoc.skip=true

Took a look at website. Home page says stuff like, “ARM Support: This is
the first release to support ARM architectures.“, which I don’t think is
true of 3.3.4 but otherwise, looks fine.

Only played with HDFS. UIs looked right.

Deployed to ten node arm64 cluster. Ran the hbase verification job on top
of it and all passed. Did some kills, stuff came back.

I didn't spend time on unit tests but one set passed on a local rig here:

[image: image.png]
Stack

On Fri, Jul 29, 2022 at 11:48 AM Steve Loughran 
wrote:

> I have put together a release candidate (RC1) for Hadoop 3.3.4
>
> The RC is available at:
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/
>
> The git tag is release-3.3.4-RC1, commit a585a73c3e0
>
> The maven artifacts are staged at
> https://repository.apache.org/content/repositories/orgapachehadoop-1358/
>
> You can find my public key at:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> Change log
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/CHANGELOG.md
>
> Release notes
>
> https://dist.apache.org/repos/dist/dev/hadoop/hadoop-3.3.4-RC1/RELEASENOTES.md
>
> There's a very small number of changes, primarily critical code/packaging
> issues and security fixes.
>
> See the release notes for details.
>
> Please try the release and vote. The vote will run for 5 days.
>
> steve
>

[jira] [Resolved] (HDFS-16586) Purge FsDatasetAsyncDiskService threadgroup; it causes BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal exception and exit'

2022-05-25 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HDFS-16586.
--
Fix Version/s: 3.4.0
   3.2.4
   3.3.4
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to branch-3, branch-3.3, and to branch-3.2. Thank you for the review 
[~hexiaoqiao] 

> Purge FsDatasetAsyncDiskService threadgroup; it causes 
> BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal 
> exception and exit' 
> -
>
> Key: HDFS-16586
> URL: https://issues.apache.org/jira/browse/HDFS-16586
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.3.0, 3.2.3
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The below failed block finalize is causing a downstreamer's test to fail when 
> it uses hadoop 3.2.3 or 3.3.0+:
> {code:java}
> 2022-05-19T18:21:08,243 INFO  [Command processor] 
> impl.FsDatasetAsyncDiskService(234): Scheduling blk_1073741840_1016 replica 
> FinalizedReplica, blk_1073741840_1016, FINALIZED
>   getNumBytes()     = 52
>   getBytesOnDisk()  = 52
>   getVisibleLength()= 52
>   getVolume()       = 
> /Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2
>   getBlockURI()     = 
> file:/Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2/current/BP-62743752-127.0.0.1-1653009535881/current/finalized/subdir0/subdir0/blk_1073741840
>  for deletion
> 2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] 
> metrics.TopMetrics(134): a metric is reported: cmd: delete user: stack.hfs.0 
> (auth:SIMPLE)
> 2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] 
> top.TopAuditLogger(78): --- logged event for top service: 
> allowed=true ugi=stack.hfs.0 (auth:SIMPLE) ip=/127.0.0.1 cmd=delete  
> src=/user/stack/test-data/b8167d53-bcd7-c682-a767-55faaf7f3e96/data/default/t1/4499521075f51d5138fe4f1916daf92d/.tmp
>   dst=null  perm=null
> 2022-05-19T18:21:08,243 DEBUG [PacketResponder: 
> BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, 
> type=LAST_IN_PIPELINE] datanode.BlockReceiver$PacketResponder(1645): 
> PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, 
> type=LAST_IN_PIPELINE, replyAck=seqno: 901 reply: SUCCESS 
> downstreamAckTimeNanos: 0 flag: 0
> 2022-05-19T18:21:08,243 DEBUG [PacketResponder: 
> BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, 
> type=LAST_IN_PIPELINE] datanode.BlockReceiver$PacketResponder(1327): 
> PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, 
> type=LAST_IN_PIPELINE: seqno=-2 waiting for local datanode to finish write.
> 2022-05-19T18:21:08,243 ERROR [Command processor] 
> datanode.BPServiceActor$CommandProcessingThread(1276): Command processor 
> encountered fatal exception and exit.
> java.lang.IllegalThreadStateException: null
>   at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:865) ~[?:?]
>   at java.lang.Thread.(Thread.java:430) ~[?:?]
>   at java.lang.Thread.(Thread.java:704) ~[?:?]
>   at java.lang.Thread.(Thread.java:525) ~[?:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$1.newThread(FsDatasetAsyncDiskService.java:113)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:623)
>  ~[?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:912)
>  ~[?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1343) 
> ~[?:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:189)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:238)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2184)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   at 
> org.apa

[jira] [Created] (HDFS-16586) Purge FsDatasetAsyncDiskService threadgroup; it causes BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal exception and exit'

2022-05-20 Thread Michael Stack (Jira)

Michael Stack created HDFS-16586:


 Summary: Purge FsDatasetAsyncDiskService threadgroup; it causes 
BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal 
exception and exit' 
 Key: HDFS-16586
 URL: https://issues.apache.org/jira/browse/HDFS-16586
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.2.3, 3.3.0
Reporter: Michael Stack
Assignee: Michael Stack


The below failed block finalize is causing a downstreamer's test to fail when 
it uses hadoop 3.2.3 or 3.3.0+:
{code:java}
2022-05-19T18:21:08,243 INFO  [Command processor] 
impl.FsDatasetAsyncDiskService(234): Scheduling blk_1073741840_1016 replica 
FinalizedReplica, blk_1073741840_1016, FINALIZED
  getNumBytes()     = 52
  getBytesOnDisk()  = 52
  getVisibleLength()= 52
  getVolume()       = 
/Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2
  getBlockURI()     = 
file:/Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2/current/BP-62743752-127.0.0.1-1653009535881/current/finalized/subdir0/subdir0/blk_1073741840
 for deletion
2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] 
metrics.TopMetrics(134): a metric is reported: cmd: delete user: stack.hfs.0 
(auth:SIMPLE)
2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] 
top.TopAuditLogger(78): --- logged event for top service: 
allowed=true ugi=stack.hfs.0 (auth:SIMPLE) ip=/127.0.0.1 cmd=delete  
src=/user/stack/test-data/b8167d53-bcd7-c682-a767-55faaf7f3e96/data/default/t1/4499521075f51d5138fe4f1916daf92d/.tmp
  dst=null  perm=null
2022-05-19T18:21:08,243 DEBUG [PacketResponder: 
BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE] 
datanode.BlockReceiver$PacketResponder(1645): PacketResponder: 
BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE, 
replyAck=seqno: 901 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0
2022-05-19T18:21:08,243 DEBUG [PacketResponder: 
BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE] 
datanode.BlockReceiver$PacketResponder(1327): PacketResponder: 
BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE: 
seqno=-2 waiting for local datanode to finish write.
2022-05-19T18:21:08,243 ERROR [Command processor] 
datanode.BPServiceActor$CommandProcessingThread(1276): Command processor 
encountered fatal exception and exit.
java.lang.IllegalThreadStateException: null
  at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:865) ~[?:?]
  at java.lang.Thread.(Thread.java:430) ~[?:?]
  at java.lang.Thread.(Thread.java:704) ~[?:?]
  at java.lang.Thread.(Thread.java:525) ~[?:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$1.newThread(FsDatasetAsyncDiskService.java:113)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:623)
 ~[?:?]
  at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:912) 
~[?:?]
  at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1343) 
~[?:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:189)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:238)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2184)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2103)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:736)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:682)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1318)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1364)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1291)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1274)
 ~[hadoop-hdfs-3.2.3.jar:?]
2022-05-19T18:21:08,243 DEBUG [DataXceiver for client

[jira] [Resolved] (HDFS-16540) Data locality is lost when DataNode pod restarts in kubernetes

2022-05-15 Thread Michael Stack (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HDFS-16540.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

Merged to branch-3.3. and to trunk.

> Data locality is lost when DataNode pod restarts in kubernetes 
> ---
>
> Key: HDFS-16540
> URL: https://issues.apache.org/jira/browse/HDFS-16540
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.2
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.4
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> We have HBase RegionServer and Hdfs DataNode running in one pod. When the pod 
> restarts, we found that data locality is lost after we do a major compaction 
> of hbase regions. After some debugging, we found that upon pod restarts, its 
> ip changes. In DatanodeManager, maps like networktopology are updated with 
> the new info. host2DatanodeMap is not updated accordingly. When hdfs client 
> with the new ip tries to find a local DataNode, it fails. 
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.3.3 (RC1)

2022-05-12 Thread Stack

+1 (binding)

* Signature: ok
* Checksum : ok
* Rat check (10.0.2): ok
 - mvn clean apache-rat:check
* Built from source (10.0.2): ok
 - mvn clean install  -DskipTests
* Unit tests pass (10.0.2): ok
 - mvn package -P runAllTests  -Dsurefire.rerunFailingTestsCount=3


[INFO] Apache Hadoop Cloud Storage Project  SUCCESS [
 0.026 s]
[INFO]

[INFO] BUILD SUCCESS
[INFO]

[INFO] Total time:  12:51 h
[INFO] Finished at: 2022-05-12T06:25:19-07:00
[INFO]

[WARNING] The requested profile "runAllTests" could not be activated
because it does not exist.

Built a downstreamer against this RC and ran it in-the-small. Seemed fine.

S


On Wed, May 11, 2022 at 10:25 AM Steve Loughran 
wrote:

> I have put together a release candidate (RC1) for Hadoop 3.3.3
>
> The RC is available at:
> https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC1/
>
> The git tag is release-3.3.3-RC1, commit d37586cbda3
>
> The maven artifacts are staged at
> https://repository.apache.org/content/repositories/orgapachehadoop-1349/
>
> You can find my public key at:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> Change log
> https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC1/CHANGELOG.md
>
> Release notes
> https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC1/RELEASENOTES.md
>
> There's a very small number of changes, primarily critical code/packaging
> issues and security fixes.
>
> * The critical fixes which shipped in the 3.2.3 release.
> * CVEs in our code and dependencies
> * Shaded client packaging issues.
> * A switch from log4j to reload4j
>
> reload4j is an active fork of the log4j 1.17 library with the classes
> which contain CVEs removed. Even though hadoop never used those classes,
> they regularly raised alerts on security scans and concen from users.
> Switching to the forked project allows us to ship a secure logging
> framework. It will complicate the builds of downstream
> maven/ivy/gradle projects which exclude our log4j artifacts, as they
> need to cut the new dependency instead/as well.
>
> See the release notes for details.
>
> This is the second release attempt. It is the same git commit as before,
> but
> fully recompiled with another republish to maven staging, which has bee
> verified by building spark, as well as a minimal test project.
>
> Please try the release and vote. The vote will run for 5 days.
>
> -Steve
>

Re: [VOTE] Release Apache Hadoop 3.3.3

2022-05-06 Thread Stack

+1 (binding)

  * Signature: ok
  * Checksum : passed
  * Rat check (1.8.0_191): passed
   - mvn clean apache-rat:check
  * Built from source (1.8.0_191): failed
   - mvn clean install  -DskipTests
   - mvn -fae --no-transfer-progress -DskipTests -Dmaven.javadoc.skip=true
-Pnative -Drequire.openssl -Drequire.snappy -Drequire.valgrind
-Drequire.zstd -Drequire.test.libhadoop clean install
  * Unit tests pass (1.8.0_191):
- HDFS Tests passed (Didn't run more than this).

Deployed a ten node ha hdfs cluster with three namenodes and five
journalnodes. Ran a ten node hbase (older version of 2.5 branch built
against 3.3.2) against it. Tried a small verification job. Good. Ran a
bigger job with mild chaos. All seems to be working properly (recoveries,
logs look fine). Killed a namenode. Failover worked promptly. UIs look
good. Poked at the hdfs cli. Seems good.

S

On Tue, May 3, 2022 at 4:24 AM Steve Loughran 
wrote:

> I have put together a release candidate (rc0) for Hadoop 3.3.3
>
> The RC is available at:
> https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC0/
>
> The git tag is release-3.3.3-RC0, commit d37586cbda3
>
> The maven artifacts are staged at
> https://repository.apache.org/content/repositories/orgapachehadoop-1348/
>
> You can find my public key at:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> Change log
> https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC0/CHANGELOG.md
>
> Release notes
> https://dist.apache.org/repos/dist/dev/hadoop/3.3.3-RC0/RELEASENOTES.md
>
> There's a very small number of changes, primarily critical code/packaging
> issues and security fixes.
>
>
>- The critical fixes which shipped in the 3.2.3 release.
>-  CVEs in our code and dependencies
>- Shaded client packaging issues.
>- A switch from log4j to reload4j
>
>
> reload4j is an active fork of the log4j 1.17 library with the classes which
> contain CVEs removed. Even though hadoop never used those classes, they
> regularly raised alerts on security scans and concen from users. Switching
> to the forked project allows us to ship a secure logging framework. It will
> complicate the builds of downstream maven/ivy/gradle projects which exclude
> our log4j artifacts, as they need to cut the new dependency instead/as
> well.
>
> See the release notes for details.
>
> This is my first release through the new docker build process, do please
> validate artifact signing  to make sure it is good. I'll be trying builds
> of downstream projects.
>
> We know there are some outstanding issues with at least one library we are
> shipping (okhttp), but I don't want to hold this release up for it. If the
> docker based release process works smoothly enough we can do a followup
> security release in a few weeks.
>
> Please try the release and vote. The vote will run for 5 days.
>
> -Steve
>

Re: [VOTE] Release Apache Hadoop 3.3.2 - RC5

2022-02-22 Thread Stack

+1

Verified checksums, signatures, and rat-check are good.

Built (RC4) locally from source and ran a small hdfs cluster with hbase on
top. Ran an hbase upload w/ chaos and verification and hdfs seemed to do
the right thing.

S

On Mon, Feb 21, 2022 at 9:17 PM Chao Sun  wrote:

> Hi all,
>
> Here's Hadoop 3.3.2 release candidate #5:
>
> The RC is available at: http://people.apache.org/~sunchao/hadoop-3.3.2-RC5
> The RC tag is at:
> https://github.com/apache/hadoop/releases/tag/release-3.3.2-RC5
> The Maven artifacts are staged at:
> https://repository.apache.org/content/repositories/orgapachehadoop-1335
>
> You can find my public key at:
> https://downloads.apache.org/hadoop/common/KEYS
>
> CHANGELOG is the only difference between this and RC4. Therefore, the tests
> I've done in RC4 are still valid:
> - Ran all the unit tests
> - Started a single node HDFS cluster and tested a few simple commands
> - Ran all the tests in Spark using the RC5 artifacts
>
> Please evaluate the RC and vote, thanks!
>
> Best,
> Chao
>

Re: [VOTE] Release Apache Hadoop 3.3.2 - RC2

2022-01-24 Thread Stack

+1 (binding)

* Signature: ok
* Checksum : ok
* Rat check (1.8.0_191): ok
 - mvn clean apache-rat:check
* Built from source (1.8.0_191): ok
 - mvn clean install  -DskipTests

Poking around in the binary, it looks good. Unpacked site. Looks right.
Checked a few links work.

Deployed over ten node cluster. Ran HBase ITBLL over it for a few hours w/
chaos. Worked like 3.3.1...

I tried to build with 3.8.1 maven and got the below.

[ERROR] Failed to execute goal on project
hadoop-yarn-applications-catalog-webapp: Could not resolve dependencies for
project
org.apache.hadoop:hadoop-yarn-applications-catalog-webapp:war:3.3.2: Failed
to collect dependencies at org.apache.solr:solr-core:jar:7.7.0 ->
org.restlet.jee:org.restlet:jar:2.3.0: Failed to read artifact descriptor
for org.restlet.
jee:org.restlet:jar:2.3.0: Could not transfer artifact
org.restlet.jee:org.restlet:pom:2.3.0 from/to maven-default-http-blocker (
http://0.0.0.0/): Blocked mirror for repositories: [maven-restlet (
http://maven.restlet.org, default, releases+snapshots), apache.snapshots (
http://repository.apache.org/snapshots, default, disabled)] -> [Help 1]

I used 3.6.3 mvn instead (looks like a simple fix).

Thanks for packaging up this fat point release Chao Sun.

S

On Wed, Jan 19, 2022 at 9:50 AM Chao Sun  wrote:

> Hi all,
>
> I've put together Hadoop 3.3.2 RC2 below:
>
> The RC is available at:
> http://people.apache.org/~sunchao/hadoop-3.3.2-RC2/
> The RC tag is at:
> https://github.com/apache/hadoop/releases/tag/release-3.3.2-RC2
> The Maven artifacts are staged at:
> https://repository.apache.org/content/repositories/orgapachehadoop-1332
>
> You can find my public key at:
> https://downloads.apache.org/hadoop/common/KEYS
>
> I've done the following tests and they look good:
> - Ran all the unit tests
> - Started a single node HDFS cluster and tested a few simple commands
> - Ran all the tests in Spark using the RC2 artifacts
>
> Please evaluate the RC and vote, thanks!
>
> Best,
> Chao
>

Re: [VOTE] Release Apache Hadoop 3.3.1 RC3

2021-06-09 Thread Stack

+1



* Signature: ok

* Checksum : ok

* Rat check (1.8.0_191): ok

 - mvn clean apache-rat:check

* Built from source (1.8.0_191): ok

 - mvn clean install -DskipTests


Ran a ten node cluster w/ hbase on top running its verification loadings w/
(gentle) chaos. Had trouble getting the rig running but mostly pilot error
and none that I could particularly attribute to hdfs after poking in logs.

Messed in UI and shell some. Nothing untoward.

Wei-Chiu fixed broke tests over in hbase and complete runs are pretty much
there (a classic flakie seems more-so on 3.3.1... will dig in more on why).


Thanks,

S


On Tue, Jun 1, 2021 at 3:29 AM Wei-Chiu Chuang  wrote:

> Hi community,
>
> This is the release candidate RC3 of Apache Hadoop 3.3.1 line. All blocker
> issues have been resolved [1] again.
>
> There are 2 additional issues resolved for RC3:
> * Revert "MAPREDUCE-7303. Fix TestJobResourceUploader failures after
> HADOOP-16878
> * Revert "HADOOP-16878. FileUtil.copy() to throw IOException if the source
> and destination are the same
>
> There are 4 issues resolved for RC2:
> * HADOOP-17666. Update LICENSE for 3.3.1
> * MAPREDUCE-7348. TestFrameworkUploader#testNativeIO fails. (#3053)
> * Revert "HADOOP-17563. Update Bouncy Castle to 1.68. (#2740)" (#3055)
> * HADOOP-17739. Use hadoop-thirdparty 1.1.1. (#3064)
>
> The Hadoop-thirdparty 1.1.1, as previously mentioned, contains two extra
> fixes compared to hadoop-thirdparty 1.1.0:
> * HADOOP-17707. Remove jaeger document from site index.
> * HADOOP-17730. Add back error_prone
>
> *RC tag is release-3.3.1-RC3
> https://github.com/apache/hadoop/releases/tag/release-3.3.1-RC3
>
> *The RC3 artifacts are at*:
> https://home.apache.org/~weichiu/hadoop-3.3.1-RC3/
> ARM artifacts: https://home.apache.org/~weichiu/hadoop-3.3.1-RC3-arm/
>
> *The maven artifacts are hosted here:*
> https://repository.apache.org/content/repositories/orgapachehadoop-1320/
>
> *My public key is available here:*
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
>
> Things I've verified:
> * all blocker issues targeting 3.3.1 have been resolved.
> * stable/evolving API changes between 3.3.0 and 3.3.1 are compatible.
> * LICENSE and NOTICE files checked
> * RELEASENOTES and CHANGELOG
> * rat check passed.
> * Built HBase master branch on top of Hadoop 3.3.1 RC2, ran unit tests.
> * Built Ozone master on top fo Hadoop 3.3.1 RC2, ran unit tests.
> * Extra: built 50 other open source projects on top of Hadoop 3.3.1 RC2.
> Had to patch some of them due to commons-lang migration (Hadoop 3.2.0) and
> dependency divergence. Issues are being identified but so far nothing
> blocker for Hadoop itself.
>
> Please try the release and vote. The vote will run for 5 days.
>
> My +1 to start,
>
> [1] https://issues.apache.org/jira/issues/?filter=12350491
> [2]
>
> https://github.com/apache/hadoop/compare/release-3.3.1-RC1...release-3.3.1-RC3
>

Re: [VOTE] hadoop-thirdparty 1.1.0-RC0

2021-05-13 Thread Stack

+1

* I verified src tgz is signed with the key from
https://people.apache.org/keys/committer/weichiu.asc
* Verified hash.
* Built from src w/ -Prelease profile
* Checked CHANGES against git log.

S




On Thu, May 13, 2021 at 12:55 PM Wei-Chiu Chuang  wrote:

> Hello my fellow Hadoop developers,
>
> I am putting together the first release candidate (RC0) for
> Hadoop-thirdparty 1.1.0. This is going to be consumed by the upcoming
> Hadoop 3.3.1 release.
>
> The RC is available at:
> https://people.apache.org/~weichiu/hadoop-thirdparty-1.1.0-RC0/
> The RC tag in github is here:
> https://github.com/apache/hadoop-thirdparty/tree/release-1.1.0-RC0
> The maven artifacts are staged at:
> https://repository.apache.org/content/repositories/orgapachehadoop-1309/
>
> You can find my public key at:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS or
> https://people.apache.org/keys/committer/weichiu.asc
>
>
> Please try the release and vote. The vote will run for 5 days until
> 2021/05/19 at 00:00 CST.
>
> Note: Our post commit automation builds the code, and pushes the SNAPSHOT
> artifacts to central Maven, which is consumed by Hadoop trunk and
> branch-3.3, so it is a good validation that things are working properly in
> hadoop-thirdparty.
>
> Thanks,
> Wei-Chiu
>

Re: [DISCUSS] Hadoop 3.3.1 release

2021-02-08 Thread Stack

On Wed, Feb 3, 2021 at 6:41 AM Steve Loughran 
wrote:

>
> Regarding blockers : how about we have a little hackathon where we try
> and get things in. This means a promise of review time from the people with
> commit rights and other people who understand the code (Stack?)
>
>

I'm up for helping get 3.3.1 out (reviewing, hackathon, testing).
Thanks,
S




> -steve
>
> On Thu, 28 Jan 2021 at 06:48, Ayush Saxena  wrote:
>
> > +1
> > Just to mention we would need to release hadoop-thirdparty too before.
> > Presently we are using the snapshot version of it.
> >
> > -Ayush
> >
> > > On 28-Jan-2021, at 6:59 AM, Wei-Chiu Chuang 
> wrote:
> > >
> > > Hi all,
> > >
> > > Hadoop 3.3.0 was released half a year ago, and as of now we've
> > accumulated
> > > more than 400 changes in the branch-3.3. A number of downstreamers are
> > > eagerly waiting for 3.3.1 which addresses the guava version conflict
> > issue.
> > >
> > >
> >
> https://issues.apache.org/jira/issues/?filter=-1=project%20in%20(HDFS%2C%20HADOOP%2C%20YARN%2C%20MAPREDUCE)%20and%20fixVersion%20in%20(3.3.1)%20and%20status%20%3D%20Resolved%20
> > >
> > > We should start the release work for 3.3.1 before the diff becomes even
> > > larger.
> > >
> > > I believe there are  currently only two real blockers for a 3.3.1
> (using
> > > this filter
> > >
> >
> https://issues.apache.org/jira/issues/?filter=-1=project%20in%20(HDFS%2C%20HADOOP%2C%20YARN%2C%20MAPREDUCE)%20AND%20cf%5B12310320%5D%20in%20(3.3.1)%20AND%20status%20not%20in%20(Resolved)%20ORDER%20BY%20priority%20DESC
> > > )
> > >
> > >
> > >   1. HDFS-15566 <https://issues.apache.org/jira/browse/HDFS-15566>
> > >   2.
> > >  1. HADOOP-17112 <
> https://issues.apache.org/jira/browse/HADOOP-17112
> > >
> > > 2.
> > >
> > >
> > >
> > > Is there anyone who would volunteer to be the 3.3.1 RM?
> > >
> > > Also, the HowToRelease wiki does not describe the ARM build process.
> > That's
> > > going to be important for future releases.
> >
> > -
> > To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> >
> >
>

Re: [DISCUSS] Hadoop 3.3.1 release

2021-01-27 Thread Stack

Thanks for bringing up the topic Wei-Chiu. +1 on a 3.3.1 soon.

Was going to spend time testing

Yours,
S

On Wed, Jan 27, 2021 at 5:28 PM Wei-Chiu Chuang  wrote:

> Hi all,
>
> Hadoop 3.3.0 was released half a year ago, and as of now we've accumulated
> more than 400 changes in the branch-3.3. A number of downstreamers are
> eagerly waiting for 3.3.1 which addresses the guava version conflict issue.
>
>
> https://issues.apache.org/jira/issues/?filter=-1=project%20in%20(HDFS%2C%20HADOOP%2C%20YARN%2C%20MAPREDUCE)%20and%20fixVersion%20in%20(3.3.1)%20and%20status%20%3D%20Resolved%20
>
> We should start the release work for 3.3.1 before the diff becomes even
> larger.
>
> I believe there are  currently only two real blockers for a 3.3.1 (using
> this filter
>
> https://issues.apache.org/jira/issues/?filter=-1=project%20in%20(HDFS%2C%20HADOOP%2C%20YARN%2C%20MAPREDUCE)%20AND%20cf%5B12310320%5D%20in%20(3.3.1)%20AND%20status%20not%20in%20(Resolved)%20ORDER%20BY%20priority%20DESC
> )
>
>
>1. HDFS-15566 
>2.
>   1. HADOOP-17112 
>  2.
>
>
>
> Is there anyone who would volunteer to be the 3.3.1 RM?
>
> Also, the HowToRelease wiki does not describe the ARM build process. That's
> going to be important for future releases.
>

[jira] [Resolved] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9

2019-07-01 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HDFS-14585.
--
Resolution: Fixed

Reapplied w/ proper commit message. Re-resolving.

> Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9
> -
>
> Key: HDFS-14585
> URL: https://issues.apache.org/jira/browse/HDFS-14585
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 2.10.0, 2.9.3
>
> Attachments: HDFS-14585.branch-2.9.v1.patch, 
> HDFS-14585.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v2.patch, 
> HDFS-14585.branch-2.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9

2019-07-01 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HDFS-14585:
--

Reopening. Commit message was missing the JIRA # so revert and reapply with 
fixed commit message.

> Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9
> -
>
> Key: HDFS-14585
> URL: https://issues.apache.org/jira/browse/HDFS-14585
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 2.10.0, 2.9.3
>
> Attachments: HDFS-14585.branch-2.9.v1.patch, 
> HDFS-14585.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v2.patch, 
> HDFS-14585.branch-2.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: Please grant JIRA contributor permission to stakiar and openinx

2019-02-18 Thread Stack

I added you fellows to hadoop common and to hadoop hdfs. Shout if it don't
work Zheng Hu.
S

On Mon, Feb 18, 2019 at 7:08 PM OpenInx  wrote:

> Dear hdfs-dev:
>
>stakiar has been working on this issue:
> https://issues.apache.org/jira/browse/HDFS-3246, but he
>has no permission to attach his patch and run hadoop QA.
>
> And I'm working on HBASE-21879,  which depends on the ByteBuffer pread
> interface, and I think
> it'll be an great p999-latency improvement for 100% get/scan case in
> HBase.
>
>Could anyone help to grant the JIRA contributor permission to us ?   so
> we can move this task
>as faster as possible :-)
>Our JIRA id are:  stakiar / openinx
>
>Thanks.
>

Re: [DISCUSSION] Create a branch to work on non-blocking access to HDFS

2018-05-31 Thread Stack

Just to close the loop, I just made a branch named HDFS-13572 to match the
new non-blocking issue (after some nice encouragement posted up on the
JIRA).
Thanks,
S

On Tue, May 15, 2018 at 9:30 PM, Stack  wrote:

> On Fri, May 4, 2018 at 5:47 AM, Anu Engineer 
> wrote:
>
>> Hi Stack,
>>
>>
>>
>> Why don’t we look at the design of what is being proposed?  Let us post
>> the design to HDFS-9924 and then if needed, by all means let us open a new
>> Jira.
>>
>> That will make it easy to understand the context if someone is looking at
>> HDFS-9924.
>>
>>
>>
>
> I posted a WIP design-for-discussion up on a new issue, HDFS-13572, after
> spending a bunch of time in HDFS-9924 and HADOOP-12910 (Duo had posted an
> earlier version on HDFS-9924 a while back).
>
> HDFS-9924 is stalled. It is filled with "discussion" that seems mostly to
> be behind where we'd like to take-off (i.e. whether hadoop2 or hadoop3
> first, what is an async api, what is async programming, etc.). We hope to
> 'vault' HDFS-9924 by skipping to an hadoop3/jdk8/CompletableFuture basis
> and by taking on contributor requests in HDFS-9924 -- e.g. a design first,
> dev in a feature branch, and so on -- EXCEPTing the hadoop2 targeting.
>
> Hence the new issue for a new undertaking (and to save folks having to
> wade through reams to get to the new effort).
>
>
>
>> I personally believe that it should be the developers of the feature that
>> should decide what goes in, what to call the branch etc. But It would be
>> nice to have
>>
>> some sort of continuity of HDFS-9924.
>>
>>
>>
>
> Agree with the above. I'll take care of tying HDFS-9924 over to the new
> issue.
>
> Thanks,
> St.Ack
>
>
>
>> Thanks
>>
>> Anu
>>
>>
>>
>> *From: * on behalf of Stack 
>> *Date: *Thursday, May 3, 2018 at 9:04 PM
>> *To: *Anu Engineer 
>> *Cc: *Wei-Chiu Chuang , "hdfs-dev@hadoop.apache.org"
>> 
>> *Subject: *Re: [DISCUSSION] Create a branch to work on non-blocking
>> access to HDFS
>>
>>
>>
>> Thanks for support Wei-Chiu and Anu.
>>
>>
>>
>> Thinking more on it, we should just open a new JIRA. HDFS-9924 is an old
>> branch with commits we don't need full of commentary that is, ahem, a mite
>> off-topic.  Duo can attach his design to the new issue. We can cite
>> HDFS-9924 as provenance and aggregate the discussion as launching pad for
>> the new effort in new issue.
>>
>>
>>
>> Hopefully this is agreeable,
>>
>> Thanks,
>>
>>
>>
>> S
>>
>>
>>
>> On Thu, May 3, 2018 at 1:54 PM, Anu Engineer 
>> wrote:
>>
>> Hi St.ack/Wei-Chiu,
>>
>> It is very kind of St.Ack to bring this question to HDFS Dev. I think
>> this is a good feature to have. As for the branch question,
>> HDFS-9924 branch is already open, we could just use that and I am +1 on
>> adding Duo as a branch committer.
>>
>> I am not familiar with HBase code base, I am presuming that there will be
>> some deviation from the current design
>> doc posted in HDFS-9924. Would it be make sense to post a new design
>> proposal on HDFS-9924?
>>
>> --Anu
>>
>>
>>
>>
>> On 5/3/18, 9:29 AM, "Wei-Chiu Chuang"  wrote:
>>
>> Given that HBase 2 uses async output by default, the way that code is
>> maintained today in HBase is not sustainable. That piece of code
>> should be
>> maintained in HDFS. I am +1 as a participant in both communities.
>>
>> On Thu, May 3, 2018 at 9:14 AM, Stack  wrote:
>>
>> > Ok with you lot if a few of us open a branch to work on a
>> non-blocking HDFS
>> > client?
>> >
>> > Intent is to finish up the old issue "HDFS-9924 [umbrella]
>> Nonblocking HDFS
>> > Access". On the foot of this umbrella JIRA is a proposal by the
>> > heavy-lifter, Duo Zhang. Over in HBase, we have a limited async DFS
>> client
>> > (written by Duo) that we use making Write-Ahead Logs. We call it
>> > AsyncFSWAL. It was shipped as the default WAL writer in hbase-2.0.0.
>> >
>> > Let me quote Duo from his proposal at the base of HDFS-9924:
>> >
>> > We use lots of internal APIs of HDFS to implement the
>> AsyncFSWAL, so it
>> > is expected that things like HBASE-20244
>> > <https://issues.apache.org/jira/browse/HBASE-20244>
>> > ["NoSuchMethodEx

[jira] [Resolved] (HDFS-13565) [um

2018-05-16 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HDFS-13565.
--
Resolution: Invalid

Smile [~ebadger]

Yeah, sorry about that lads. Bad wifi. Resolving as invalid.



> [um
> ---
>
> Key: HDFS-13565
> URL: https://issues.apache.org/jira/browse/HDFS-13565
> Project: Hadoop HDFS
>  Issue Type: New Feature
>    Reporter: stack
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [DISCUSSION] Create a branch to work on non-blocking access to HDFS

2018-05-15 Thread Stack

On Fri, May 4, 2018 at 5:47 AM, Anu Engineer <aengin...@hortonworks.com>
wrote:

> Hi Stack,
>
>
>
> Why don’t we look at the design of what is being proposed?  Let us post
> the design to HDFS-9924 and then if needed, by all means let us open a new
> Jira.
>
> That will make it easy to understand the context if someone is looking at
> HDFS-9924.
>
>
>

I posted a WIP design-for-discussion up on a new issue, HDFS-13572, after
spending a bunch of time in HDFS-9924 and HADOOP-12910 (Duo had posted an
earlier version on HDFS-9924 a while back).

HDFS-9924 is stalled. It is filled with "discussion" that seems mostly to
be behind where we'd like to take-off (i.e. whether hadoop2 or hadoop3
first, what is an async api, what is async programming, etc.). We hope to
'vault' HDFS-9924 by skipping to an hadoop3/jdk8/CompletableFuture basis
and by taking on contributor requests in HDFS-9924 -- e.g. a design first,
dev in a feature branch, and so on -- EXCEPTing the hadoop2 targeting.

Hence the new issue for a new undertaking (and to save folks having to wade
through reams to get to the new effort).



> I personally believe that it should be the developers of the feature that
> should decide what goes in, what to call the branch etc. But It would be
> nice to have
>
> some sort of continuity of HDFS-9924.
>
>
>

Agree with the above. I'll take care of tying HDFS-9924 over to the new
issue.

Thanks,
St.Ack



> Thanks
>
> Anu
>
>
>
> *From: *<saint@gmail.com> on behalf of Stack <st...@duboce.net>
> *Date: *Thursday, May 3, 2018 at 9:04 PM
> *To: *Anu Engineer <aengin...@hortonworks.com>
> *Cc: *Wei-Chiu Chuang <weic...@apache.org>, "hdfs-dev@hadoop.apache.org" <
> hdfs-dev@hadoop.apache.org>
> *Subject: *Re: [DISCUSSION] Create a branch to work on non-blocking
> access to HDFS
>
>
>
> Thanks for support Wei-Chiu and Anu.
>
>
>
> Thinking more on it, we should just open a new JIRA. HDFS-9924 is an old
> branch with commits we don't need full of commentary that is, ahem, a mite
> off-topic.  Duo can attach his design to the new issue. We can cite
> HDFS-9924 as provenance and aggregate the discussion as launching pad for
> the new effort in new issue.
>
>
>
> Hopefully this is agreeable,
>
> Thanks,
>
>
>
> S
>
>
>
> On Thu, May 3, 2018 at 1:54 PM, Anu Engineer <aengin...@hortonworks.com>
> wrote:
>
> Hi St.ack/Wei-Chiu,
>
> It is very kind of St.Ack to bring this question to HDFS Dev. I think this
> is a good feature to have. As for the branch question,
> HDFS-9924 branch is already open, we could just use that and I am +1 on
> adding Duo as a branch committer.
>
> I am not familiar with HBase code base, I am presuming that there will be
> some deviation from the current design
> doc posted in HDFS-9924. Would it be make sense to post a new design
> proposal on HDFS-9924?
>
> --Anu
>
>
>
>
> On 5/3/18, 9:29 AM, "Wei-Chiu Chuang" <weic...@apache.org> wrote:
>
> Given that HBase 2 uses async output by default, the way that code is
> maintained today in HBase is not sustainable. That piece of code
> should be
> maintained in HDFS. I am +1 as a participant in both communities.
>
> On Thu, May 3, 2018 at 9:14 AM, Stack <st...@duboce.net> wrote:
>
> > Ok with you lot if a few of us open a branch to work on a
> non-blocking HDFS
> > client?
> >
> > Intent is to finish up the old issue "HDFS-9924 [umbrella]
> Nonblocking HDFS
> > Access". On the foot of this umbrella JIRA is a proposal by the
> > heavy-lifter, Duo Zhang. Over in HBase, we have a limited async DFS
> client
> > (written by Duo) that we use making Write-Ahead Logs. We call it
> > AsyncFSWAL. It was shipped as the default WAL writer in hbase-2.0.0.
> >
> > Let me quote Duo from his proposal at the base of HDFS-9924:
> >
> > We use lots of internal APIs of HDFS to implement the
> AsyncFSWAL, so it
> > is expected that things like HBASE-20244
> > <https://issues.apache.org/jira/browse/HBASE-20244>
> > ["NoSuchMethodException
> > when retrieving private method decryptEncryptedDataEncryptionKey
> from
> > DFSClient"] will happen again and again.
> >
> > To make life easier, we need to move the async output related code
> into
> > HDFS. The POC [attached as patch on HDFS-9924] shows that option 3
> [1] can
> > work, so I would like to create a feature branch to implement the
> async dfs
> > client. In general I think there are 4 steps

[jira] [Created] (HDFS-13572) [umbrella] Non-blocking HDFS Access for H3

2018-05-15 Thread stack (JIRA)

stack created HDFS-13572:


 Summary: [umbrella] Non-blocking HDFS Access for H3
 Key: HDFS-13572
 URL: https://issues.apache.org/jira/browse/HDFS-13572
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: fs async
Affects Versions: 3.0.0
Reporter: stack


An umbrella JIRA for supporting non-blocking HDFS access in h3.

This issue has provenance in the stalled HDFS-9924 but would like to vault over 
what was going on over there, in particular, focus on an async API for hadoop3+ 
unencumbered by worries about how to make it work in hadoop2.

Let me post a WIP design. Would love input/feedback (We make mention of the 
HADOOP-12910 call for spec but as future work -- hopefully thats ok). Was 
thinking of cutting a feature branch if all good after a bit of chat.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-13565) [um

2018-05-15 Thread stack (JIRA)

stack created HDFS-13565:


 Summary: [um
 Key: HDFS-13565
 URL: https://issues.apache.org/jira/browse/HDFS-13565
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: stack






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[DISCUSSION] Create a branch to work on non-blocking access to HDFS

2018-05-03 Thread Stack

Ok with you lot if a few of us open a branch to work on a non-blocking HDFS
client?

Intent is to finish up the old issue "HDFS-9924 [umbrella] Nonblocking HDFS
Access". On the foot of this umbrella JIRA is a proposal by the
heavy-lifter, Duo Zhang. Over in HBase, we have a limited async DFS client
(written by Duo) that we use making Write-Ahead Logs. We call it
AsyncFSWAL. It was shipped as the default WAL writer in hbase-2.0.0.

Let me quote Duo from his proposal at the base of HDFS-9924:

We use lots of internal APIs of HDFS to implement the AsyncFSWAL, so it
is expected that things like HBASE-20244
 ["NoSuchMethodException
when retrieving private method decryptEncryptedDataEncryptionKey from
DFSClient"] will happen again and again.

To make life easier, we need to move the async output related code into
HDFS. The POC [attached as patch on HDFS-9924] shows that option 3 [1] can
work, so I would like to create a feature branch to implement the async dfs
client. In general I think there are 4 steps:

1. Implement an async rpc client with option 3 [1] described above.
2. Implement the filesystem APIs which only need to connect to NN, such as
'mkdirs'.
3. Implement async file read. The problem is the API. For pread I think a
CompletableFuture is enough, the problem is for the streaming read. Need to
discuss later.
4. Implement async file write. The API will also be a problem, but a more
important problem is that, if we want to support fan-out, the current logic
at DN side will make the semantic broken as we can read uncommitted data
very easily. In HBase it is solved by HBASE-14004
 but I do not think we
should keep the broken behavior in HDFS. We need to find a way to deal with
it.

Comments welcome.

Intent is to make a branch named HDFS-9924 (or should we just do a new
JIRA?) and to add Duo as a feature branch committer. If all goes well,
we'll call for a merge VOTE.

Thanks,
St.Ack

1.Option 3:  "Use the old protobuf rpc interface and implement a new rpc
framework. The benefit is that we also do not need port unification service
at server side and do not need to maintain two implementations at server
side. And one more thing is that we do not need to upgrade protobuf to 3.x."

Re: [DISCUSSION] Merging HDFS-7240 Object Store (Ozone) to trunk

2017-11-03 Thread Stack

On Sat, Oct 28, 2017 at 2:00 PM, Konstantin Shvachko 
wrote:

> Hey guys,
>
> It is an interesting question whether Ozone should be a part of Hadoop.
>


I don't see a direct answer to this question. Is there one? Pardon me if
I've not seen it but I'm interested in the response.

I ask because IMO the "Hadoop" project is over-stuffed already. Just see
the length of the cc list on this email. Ozone could be standalone. It is a
coherent enough effort.

Thanks,
St.Ack





> There are two main reasons why I think it should not.
>
1. With close to 500 sub-tasks, with 6 MB of code changes, and with a
> sizable community behind, it looks to me like a whole new project.
> It is essentially a new storage system, with different (than HDFS)
> architecture, separate S3-like APIs. This is really great - the World sure
> needs more distributed file systems. But it is not clear why Ozone should
> co-exist with HDFS under the same roof.
>
> 2. Ozone is probably just the first step in rebuilding HDFS under a new
> architecture. With the next steps presumably being HDFS-10419 and
> HDFS-8.
> The design doc for the new architecture has never been published. I can
> only assume based on some presentations and personal communications that
> the idea is to use Ozone as a block storage, and re-implement NameNode, so
> that it stores only a partial namesapce in memory, while the bulk of it
> (cold data) is persisted to a local storage.
> Such architecture makes me wonder if it solves Hadoop's main problems.
> There are two main limitations in HDFS:
>   a. The throughput of Namespace operations. Which is limited by the number
> of RPCs the NameNode can handle
>   b. The number of objects (files + blocks) the system can maintain. Which
> is limited by the memory size of the NameNode.
> The RPC performance (a) is more important for Hadoop scalability than the
> object count (b). The read RPCs being the main priority.
> The new architecture targets the object count problem, but in the expense
> of the RPC throughput. Which seems to be a wrong resolution of the
> tradeoff.
> Also based on the use patterns on our large clusters we read up to 90% of
> the data we write, so cold data is a small fraction and most of it must be
> cached.
>
> To summarize:
> - Ozone is a big enough system to deserve its own project.
> - The architecture that Ozone leads to does not seem to solve the intrinsic
> problems of current HDFS.
>
> I will post my opinion in the Ozone jira. Should be more convenient to
> discuss it there for further reference.
>
> Thanks,
> --Konstantin
>
>
>
> On Wed, Oct 18, 2017 at 6:54 PM, Yang Weiwei 
> wrote:
>
> > Hello everyone,
> >
> >
> > I would like to start this thread to discuss merging Ozone (HDFS-7240) to
> > trunk. This feature implements an object store which can co-exist with
> > HDFS. Ozone is disabled by default. We have tested Ozone with cluster
> sizes
> > varying from 1 to 100 data nodes.
> >
> >
> >
> > The merge payload includes the following:
> >
> >   1.  All services, management scripts
> >   2.  Object store APIs, exposed via both REST and RPC
> >   3.  Master service UIs, command line interfaces
> >   4.  Pluggable pipeline Integration
> >   5.  Ozone File System (Hadoop compatible file system implementation,
> > passes all FileSystem contract tests)
> >   6.  Corona - a load generator for Ozone.
> >   7.  Essential documentation added to Hadoop site.
> >   8.  Version specific Ozone Documentation, accessible via service UI.
> >   9.  Docker support for ozone, which enables faster development cycles.
> >
> >
> > To build Ozone and run ozone using docker, please follow instructions in
> > this wiki page. https://cwiki.apache.org/confl
> > uence/display/HADOOP/Dev+cluster+with+docker.
> >
> >
> > We have built a passionate and diverse community to drive this feature
> > development. As a team, we have achieved significant progress in past 3
> > years since first JIRA for HDFS-7240 was opened on Oct 2014. So far, we
> > have resolved almost 400 JIRAs by 20+ contributors/committers from
> > different countries and affiliations. We also want to thank the large
> > number of community members who were supportive of our efforts and
> > contributed ideas and participated in the design of ozone.
> >
> >
> > Please share your thoughts, thanks!
> >
> >
> > -- Weiwei Yang
> >
>
>
> On Wed, Oct 18, 2017 at 6:54 PM, Yang Weiwei 
> wrote:
>
> > Hello everyone,
> >
> >
> > I would like to start this thread to discuss merging Ozone (HDFS-7240) to
> > trunk. This feature implements an object store which can co-exist with
> > HDFS. Ozone is disabled by default. We have tested Ozone with cluster
> sizes
> > varying from 1 to 100 data nodes.
> >
> >
> >
> > The merge payload includes the following:
> >
> >   1.  All services, management scripts
> >   2.  Object store APIs, exposed via both REST and RPC
> >   3.  Master service UIs, command line

Re: Can we update protobuf's version on trunk?

2017-03-30 Thread Stack

On Thu, Mar 30, 2017 at 9:16 AM, Chris Douglas <chris.doug...@gmail.com>
wrote:

> On Wed, Mar 29, 2017 at 4:59 PM, Stack <st...@duboce.net> wrote:
> >> The former; an intermediate handler decoding, [modifying,] and
> >> encoding the record without losing unknown fields.
> >>
> >
> > I did not try this. Did you? Otherwise I can.
>
> Yeah, I did. Same format. -C
>
>
Grand.
St.Ack




> >> This looks fine. -C
> >>
> >> > Thanks,
> >> > St.Ack
> >> >
> >> >
> >> > # Using the protoc v3.0.2 tool
> >> > $ protoc --version
> >> > libprotoc 3.0.2
> >> >
> >> > # I have a simple proto definition with two fields in it
> >> > $ more pb.proto
> >> > message Test {
> >> >   optional string one = 1;
> >> >   optional string two = 2;
> >> > }
> >> >
> >> > # This is a text-encoded instance of a 'Test' proto message:
> >> > $ more pb.txt
> >> > one: "one"
> >> > two: "two"
> >> >
> >> > # Now I encode the above as a pb binary
> >> > $ protoc --encode=Test pb.proto < pb.txt > pb.bin
> >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No
> syntax
> >> > specified for the proto file: pb.proto. Please use 'syntax =
> "proto2";'
> >> > or
> >> > 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2
> >> > syntax.)
> >> >
> >> > # Here is a dump of the binary
> >> > $ od -xc pb.bin
> >> > 000  030a6e6f126574036f77
> >> >   \n 003   o   n   e 022 003   t   w   o
> >> > 012
> >> >
> >> > # Here is a proto definition file that has a Test Message minus the
> >> > 'two'
> >> > field.
> >> > $ more pb_drops_two.proto
> >> > message Test {
> >> >   optional string one = 1;
> >> > }
> >> >
> >> > # Use it to decode the bin file:
> >> > $ protoc --decode=Test pb_drops_two.proto < pb.bin
> >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No
> syntax
> >> > specified for the proto file: pb_drops_two.proto. Please use 'syntax =
> >> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
> >> > (Defaulted
> >> > to proto2 syntax.)
> >> > one: "one"
> >> > 2: "two"
> >> >
> >> > Note how the second field is preserved (absent a field name). It is
> not
> >> > dropped.
> >> >
> >> > If I change the syntax of pb_drops_two.proto to be proto3, the field
> IS
> >> > dropped.
> >> >
> >> > # Here proto file with proto3 syntax specified (had to drop the
> >> > 'optional'
> >> > qualifier -- not allowed in proto3):
> >> > $ more pb_drops_two.proto
> >> > syntax = "proto3";
> >> > message Test {
> >> >   string one = 1;
> >> > }
> >> >
> >> > $ protoc --decode=Test pb_drops_two.proto < pb.bin  > pb_drops_two.txt
> >> > $ more pb_drops_two.txt
> >> > one: "one"
> >> >
> >> >
> >> > I cannot reencode the text output using pb_drops_two.proto. It
> >> > complains:
> >> >
> >> > $ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt >
> >> > pb_drops_two.bin
> >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No
> syntax
> >> > specified for the proto file: pb_drops_two.proto. Please use 'syntax =
> >> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
> >> > (Defaulted
> >> > to proto2 syntax.)
> >> > input:2:1: Expected identifier, got: 2
> >> >
> >> > Proto 2.5 does same:
> >> >
> >> > $ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto <
> >> > pb_drops_two.txt > pb_drops_two.bin
> >> > input:2:1: Expected identifier.
> >> > Failed to parse input.
> >> >
> >> > St.Ack
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Wed, Mar 29, 2017 at 10:14 AM, Stack <st...@duboce.net> wrote:
>

Re: Can we update protobuf's version on trunk?

2017-03-29 Thread Stack

On Wed, Mar 29, 2017 at 3:12 PM, Chris Douglas <chris.doug...@gmail.com>
wrote:

> On Wed, Mar 29, 2017 at 1:13 PM, Stack <st...@duboce.net> wrote:
> > Is the below evidence enough that pb3 in proto2 syntax mode does not drop
> > 'unknown' fields? (Maybe you want evidence that java tooling behaves the
> > same?)
>
> I reproduced your example with the Java tooling, including changing
> some of the fields in the intermediate representation. As long as the
> syntax is "proto2", it seems to have compatible semantics.
>
>
Thanks.


> > To be clear, when we say proxy above, are we expecting that a pb message
> > deserialized by a process down-the-line that happens to have a crimped
> proto
> > definition that is absent a couple of fields somehow can re-serialize
> and at
> > the end of the line, all fields are present? Or are we talking
> pass-through
> > of the message without rewrite?
>
> The former; an intermediate handler decoding, [modifying,] and
> encoding the record without losing unknown fields.
>
>
I did not try this. Did you? Otherwise I can.

St.Ack


> This looks fine. -C
>
> > Thanks,
> > St.Ack
> >
> >
> > # Using the protoc v3.0.2 tool
> > $ protoc --version
> > libprotoc 3.0.2
> >
> > # I have a simple proto definition with two fields in it
> > $ more pb.proto
> > message Test {
> >   optional string one = 1;
> >   optional string two = 2;
> > }
> >
> > # This is a text-encoded instance of a 'Test' proto message:
> > $ more pb.txt
> > one: "one"
> > two: "two"
> >
> > # Now I encode the above as a pb binary
> > $ protoc --encode=Test pb.proto < pb.txt > pb.bin
> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
> > specified for the proto file: pb.proto. Please use 'syntax = "proto2";'
> or
> > 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2
> > syntax.)
> >
> > # Here is a dump of the binary
> > $ od -xc pb.bin
> > 000  030a6e6f126574036f77
> >   \n 003   o   n   e 022 003   t   w   o
> > 012
> >
> > # Here is a proto definition file that has a Test Message minus the 'two'
> > field.
> > $ more pb_drops_two.proto
> > message Test {
> >   optional string one = 1;
> > }
> >
> > # Use it to decode the bin file:
> > $ protoc --decode=Test pb_drops_two.proto < pb.bin
> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
> > specified for the proto file: pb_drops_two.proto. Please use 'syntax =
> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
> (Defaulted
> > to proto2 syntax.)
> > one: "one"
> > 2: "two"
> >
> > Note how the second field is preserved (absent a field name). It is not
> > dropped.
> >
> > If I change the syntax of pb_drops_two.proto to be proto3, the field IS
> > dropped.
> >
> > # Here proto file with proto3 syntax specified (had to drop the
> 'optional'
> > qualifier -- not allowed in proto3):
> > $ more pb_drops_two.proto
> > syntax = "proto3";
> > message Test {
> >   string one = 1;
> > }
> >
> > $ protoc --decode=Test pb_drops_two.proto < pb.bin  > pb_drops_two.txt
> > $ more pb_drops_two.txt
> > one: "one"
> >
> >
> > I cannot reencode the text output using pb_drops_two.proto. It complains:
> >
> > $ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt >
> > pb_drops_two.bin
> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
> > specified for the proto file: pb_drops_two.proto. Please use 'syntax =
> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
> (Defaulted
> > to proto2 syntax.)
> > input:2:1: Expected identifier, got: 2
> >
> > Proto 2.5 does same:
> >
> > $ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto <
> > pb_drops_two.txt > pb_drops_two.bin
> > input:2:1: Expected identifier.
> > Failed to parse input.
> >
> > St.Ack
> >
> >
> >
> >
> >
> >
> > On Wed, Mar 29, 2017 at 10:14 AM, Stack <st...@duboce.net> wrote:
> >>
> >> On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang <andrew.w...@cloudera.com>
> >> wrote:
> >>>
> >>> >
> >>> > > If unknown fields are dropped, then applications

Re: Can we update protobuf's version on trunk?

2017-03-29 Thread Stack

Is the below evidence enough that pb3 in proto2 syntax mode does not drop
'unknown' fields? (Maybe you want evidence that java tooling behaves the
same?)

To be clear, when we say proxy above, are we expecting that a pb message
deserialized by a process down-the-line that happens to have a crimped
proto definition that is absent a couple of fields somehow can re-serialize
and at the end of the line, all fields are present? Or are we talking
pass-through of the message without rewrite?

Thanks,
St.Ack

# Using the protoc v3.0.2 tool
$ protoc --version
libprotoc 3.0.2

# I have a simple proto definition with two fields in it
$ more pb.proto
message Test {
  optional string one = 1;
  optional string two = 2;
}

# This is a text-encoded instance of a 'Test' proto message:
$ more pb.txt
one: "one"
two: "two"

# Now I encode the above as a pb binary
$ protoc --encode=Test pb.proto < pb.txt > pb.bin
[libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
specified for the proto file: pb.proto. Please use 'syntax = "proto2";' or
'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2
syntax.)

# Here is a dump of the binary
$ od -xc pb.bin
000  030a6e6f126574036f77
  \n 003   o   n   e 022 003   t   w   o
012

# Here is a proto definition file that has a Test Message minus the 'two'
field.
$ more pb_drops_two.proto
message Test {
  optional string one = 1;
}

# Use it to decode the bin file:
$ protoc --decode=Test pb_drops_two.proto < pb.bin
[libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
specified for the proto file: pb_drops_two.proto. Please use 'syntax =
"proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted
to proto2 syntax.)
one: "one"
2: "two"

Note how the second field is preserved (absent a field name). It is not
dropped.

If I change the syntax of pb_drops_two.proto to be proto3, the field IS
dropped.

# Here proto file with proto3 syntax specified (had to drop the 'optional'
qualifier -- not allowed in proto3):
$ more pb_drops_two.proto
syntax = "proto3";
message Test {
  string one = 1;
}

$ protoc --decode=Test pb_drops_two.proto < pb.bin  > pb_drops_two.txt
$ more pb_drops_two.txt
one: "one"

I cannot reencode the text output using pb_drops_two.proto. It complains:

$ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt >
pb_drops_two.bin
[libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
specified for the proto file: pb_drops_two.proto. Please use 'syntax =
"proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted
to proto2 syntax.)
input:2:1: Expected identifier, got: 2

Proto 2.5 does same:

$ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto <
pb_drops_two.txt > pb_drops_two.bin
input:2:1: Expected identifier.
Failed to parse input.

St.Ack

On Wed, Mar 29, 2017 at 10:14 AM, Stack <st...@duboce.net> wrote:

> On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
>
>> >
>> > > If unknown fields are dropped, then applications proxying tokens and
>> > other
>> > >> data between servers will effectively corrupt those messages, unless
>> we
>> > >> make everything opaque bytes, which- absent the convenient,
>> prenominate
>> > >> semantics managing the conversion- obviate the compatibility
>> machinery
>> > that
>> > >> is the whole point of PB. Google is removing the features that
>> justified
>> > >> choosing PB over its alternatives. Since we can't require that our
>> > >> applications compile (or link) against our updated schema, this
>> creates
>> > a
>> > >> problem that PB was supposed to solve.
>> > >
>> > >
>> > > This is scary, and it potentially affects services outside of the
>> Hadoop
>> > > codebase. This makes it difficult to assess the impact.
>> >
>> > Stack mentioned a compatibility mode that uses the proto2 semantics.
>> > If that carries unknown fields through intermediate handlers, then
>> > this objection goes away. -C
>>
>>
>> Did some more googling, found this:
>>
>> https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ
>>
>> Feng Xiao appears to be a Google engineer, and suggests workarounds like
>> packing the fields into a byte type. No mention of a PB2 compatibility
>> mode. Also here:
>>
>> https://groups.google.com/d/msg/protobuf/bO2L6-_t91Q/-zIaJAR9AAAJ
>>
>> Participants say that unknown fields were dropped for automatic JSON
>> encoding, since you can't losslessly convert to JSON without knowing the
>> type.
>>
>> Unfortunately, it sounds like these are intrinsic differences with PB3.
>>
>>
> As I read it Andrew, the field-dropping happens when pb3 is running in
> proto3 'mode'. Let me try it...
>
> St.Ack
>
>
>
>> Best,
>> Andrew
>>
>
>

[jira] [Created] (HDFS-11368) LocalFS does not allow setting storage policy so spew running in local mode

2017-01-25 Thread stack (JIRA)

stack created HDFS-11368:


 Summary: LocalFS does not allow setting storage policy so spew 
running in local mode
 Key: HDFS-11368
 URL: https://issues.apache.org/jira/browse/HDFS-11368
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Minor


commit f92a14ade635e4b081f3938620979b5864ac261f
Author: Yu Li <l...@apache.org>
Date:   Mon Jan 9 09:52:58 2017 +0800

HBASE-14061 Support CF-level Storage Policy

...added setting storage policy which is nice. Being able to set storage policy 
came in in hdfs 2.6.0 (HDFS-6584 Support Archival Storage) but you can only do 
this for DFS, not for local FS.

Upshot is that starting up hbase in standalone mode, which uses localfs, you 
get this exception every time:

{code}
2017-01-25 12:26:53,400 WARN  [StoreOpener-93375c645ef2e649620b5d8ed9375985-1] 
fs.HFileSystem: Failed to set storage policy of 
[file:/var/folders/d8/8lyxycpd129d4fj7lb684dwhgp/T/hbase-stack/hbase/data/hbase/namespace/93375c645ef2e649620b5d8ed9375985/info]
 to [HOT]
java.lang.UnsupportedOperationException: Cannot find specified method 
setStoragePolicy
at 
org.apache.hadoop.hbase.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:209)
at 
org.apache.hadoop.hbase.fs.HFileSystem.setStoragePolicy(HFileSystem.java:161)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hbase.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:207)
at 
org.apache.hadoop.hbase.regionserver.HRegionFileSystem.setStoragePolicy(HRegionFileSystem.java:198)
at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:237)
at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:5265)
at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:988)
at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:985)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoSuchMethodException: 
org.apache.hadoop.fs.LocalFileSystem.setStoragePolicy(org.apache.hadoop.fs.Path,
 java.lang.String)
at java.lang.Class.getMethod(Class.java:1786)
at 
org.apache.hadoop.hbase.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:205)
...
{code}

It is distracting at the least. Let me fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-9187) Check if tracer is null before using it

2015-10-01 Thread stack (JIRA)

stack created HDFS-9187:
---

 Summary: Check if tracer is null before using it
 Key: HDFS-9187
 URL: https://issues.apache.org/jira/browse/HDFS-9187
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tracing
Affects Versions: 2.8.0
Reporter: stack


Saw this where an hbase that has not been updated to htrace-4.0.1 was trying to 
start:

{code}
Oct 1, 5:12:11.861 AM FATAL org.apache.hadoop.hbase.master.HMaster
Failed to become active master
java.lang.NullPointerException
at org.apache.hadoop.fs.Globber.glob(Globber.java:145)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1634)
at org.apache.hadoop.hbase.util.FSUtils.getTableDirs(FSUtils.java:1372)
at 
org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:206)
at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:619)
at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:169)
at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1481)
at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Looking to a Hadoop 3 release

2015-03-04 Thread Stack

In general +1 on 3.0.0. Its time. If we start now, it might make it out by
2016. If we start now, downstreamers can start aligning themselves to land
versions that suit at about the same time.

While two big items have been called out as possible incompatible changes,
and there is ongoing discussion as to whether they are or not*, is there
any chance of getting a longer list of big differences between the
branches? In particular I'd be interested in improvements that are 'off' by
default that would be better defaulted 'on'.

Thanks,
St.Ack

* Let me note that 'compatible' around these parts is a trampled concept
seemingly open to interpretation with a definition that is other than
prevails elsewhere in software. See Allen's list above, and in our
downstream project, the recent HBASE-13149 HBase server MR tools are
broken on Hadoop 2.5+ Yarn, among others.  Let 3.x be incompatible with
2.x if only so we can leave behind all current notions of 'compatibility'
and just start over (as per Allen).


On Mon, Mar 2, 2015 at 3:19 PM, Andrew Wang andrew.w...@cloudera.com
wrote:

 Hi devs,

 It's been a year and a half since 2.x went GA, and I think we're about due
 for a 3.x release.
 Notably, there are two incompatible changes I'd like to call out, that will
 have a tremendous positive impact for our users.

 First, classpath isolation being done at HADOOP-11656, which has been a
 long-standing request from many downstreams and Hadoop users.

 Second, bumping the source and target JDK version to JDK8 (related to
 HADOOP-11090), which is important since JDK7 is EOL in April 2015 (two
 months from now). In the past, we've had issues with our dependencies
 discontinuing support for old JDKs, so this will future-proof us.

 Between the two, we'll also have quite an opportunity to clean up and
 upgrade our dependencies, another common user and developer request.

 I'd like to propose that we start rolling a series of monthly-ish series of
 3.0 alpha releases ASAP, with myself volunteering to take on the RM and
 other cat herding responsibilities. There are already quite a few changes
 slated for 3.0 besides the above (for instance the shell script rewrite) so
 there's already value in a 3.0 alpha, and the more time we give downstreams
 to integrate, the better.

 This opens up discussion about inclusion of other changes, but I'm hoping
 to freeze incompatible changes after maybe two alphas, do a beta (with no
 further incompat changes allowed), and then finally a 3.x GA. For those
 keeping track, that means a 3.x GA in about four months.

 I would also like to stress though that this is not intended to be a big
 bang release. For instance, it would be great if we could maintain wire
 compatibility between 2.x and 3.x, so rolling upgrades work. Keeping
 branch-2 and branch-3 similar also makes backports easier, since we're
 likely maintaining 2.x for a while yet.

 Please let me know any comments / concerns related to the above. If people
 are friendly to the idea, I'd like to cut a branch-3 and start working on
 the first alpha.

 Best,
 Andrew

Re: [DISCUSS] Allow continue reading from being-written file using same stream

2014-09-19 Thread Stack

On Thu, Sep 18, 2014 at 12:48 AM, Vinayakumar B vinayakum...@apache.org
wrote:

 Hi all,

 Currently *DFSInputStream *doen't allow reading a write-inprogress file,
 once all written bytes, by the time of opening an input stream, are read.

 To read further update on the same file, needs to be read by opening
 another stream to the same file again.

 Instead how about refreshing length of such open files if the current
 position is at earlier EOF.


Are you talking tailing an HDFS file without having to fake it with a loop
that does open, read till EOF, close, repeat?  If so, sounds great.
St.Ack

[jira] [Created] (HDFS-6803) Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context

2014-07-31 Thread stack (JIRA)

stack created HDFS-6803:
---

 Summary: Documenting DFSClient#DFSInputStream expectations reading 
and preading in concurrent context
 Key: HDFS-6803
 URL: https://issues.apache.org/jira/browse/HDFS-6803
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 2.4.1
Reporter: stack
 Attachments: DocumentingDFSClientDFSInputStream (1).pdf

Reviews of the patch posted the parent task suggest that we be more explicit 
about how DFSIS is expected to behave when being read by contending threads. It 
is also suggested that presumptions made internally be made explicit 
documenting expectations.

Before we put up a patch we've made a document of assertions we'd like to make 
into tenets of DFSInputSteam.  If agreement, we'll attach to this issue a patch 
that weaves the assumptions into DFSIS as javadoc and class comments. 





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6047) TestPread NPE inside in DFSInputStream hedgedFetchBlockByteRange

2014-03-03 Thread stack (JIRA)

stack created HDFS-6047:
---

 Summary: TestPread NPE inside in DFSInputStream 
hedgedFetchBlockByteRange
 Key: HDFS-6047
 URL: https://issues.apache.org/jira/browse/HDFS-6047
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: stack
Assignee: stack
 Fix For: 2.4.0


Our [~andrew.wang] saw this on internal test cluster running trunk:

{code}
java.lang.NullPointerException: null
at 
org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1181)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1296)
at org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:78)
at 
org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:108)
at org.apache.hadoop.hdfs.TestPread.pReadFile(TestPread.java:151)
at 
org.apache.hadoop.hdfs.TestPread.testMaxOutHedgedReadPool(TestPread.java:292)
{code}

TestPread was failing.

The NPE comes of our presuming there always a chosenNode as we set up hedged 
reads inside in hedgedFetchBlockByteRange (chosenNode is null'd each time 
through the loop).  Usually there is a chosenNode but need to allow for case 
where there is not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: [VOTE] Release Apache Hadoop 2.3.0

2014-02-12 Thread Stack

+1

Downloaded, deployed to small cluster, and then ran an hbase loading on top
of it.  Looks good.

Packaging wise, is it intentional that some jars show up a few times?  I
can understand webapps bundling a copy but doesn't mapreduce depend on
commons?

share/hadoop/mapreduce/lib/hadoop-annotations-2.3.0.jar
share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/hadoop-annotations-2.3.0.jar
share/hadoop/common/lib/hadoop-annotations-2.3.0.jar

If not intentional, I can make up a better report and file an issue.

Thanks,
St.Ack




On Tue, Feb 11, 2014 at 6:49 AM, Arun C Murthy a...@hortonworks.com wrote:

 Folks,

 I've created a release candidate (rc0) for hadoop-2.3.0 that I would like
 to get released.

 The RC is available at:
 http://people.apache.org/~acmurthy/hadoop-2.3.0-rc0
 The RC tag in svn is here:
 https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.3.0-rc0

 The maven artifacts are available via repository.apache.org.

 Please try the release and vote; the vote will run for the usual 7 days.

 thanks,
 Arun

 PS: Thanks to Andrew, Vinod  Alejandro for all their help in various
 release activities.
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.

Re: Is it me or is the new bootstrap HDFS UI HWX green? (See tip of hadoop-2.3 branch)

2014-02-04 Thread Stack

On Mon, Feb 3, 2014 at 9:26 PM, Chris Douglas cdoug...@apache.org wrote:

 ...
 Please take this offline. -C


No problem.
St.Ack

Re: Is it me or is the new bootstrap HDFS UI HWX green? (See tip of hadoop-2.3 branch)

2014-02-02 Thread Stack

Sorry for the delay.


On Wed, Jan 29, 2014 at 10:05 PM, Vinod Kumar Vavilapalli vinodkv@
hortonworks.com wrote:


 My response was to your direct association of the green color to HWX green
 as if it were deliberately done. Nobody here intentionally left a vendor's
 signature like you claimed.



I did not do what you accuse me of above.  Please gp back to the original.
 All is couched in 'is it me' and 'I think'.  No accusations of deliberate
vendor insert.  That is your addition.



 And to your other comment Does the apache binary have to be compiled by
 'hortonmu'?   Could it be compiled by 'arun', or 'apachemu'? to the
 message in the build. As if somebody said it has to be.



The Apache Hadoop version string should be pure, free of vendor pollution.
 Seems obvious to me.  I could call a vote and get it written into the
bylaws but seems a bit of a useless exercise?

(This is now a non-issue anyways having been 'fixed'.  While some chose to
do histrionics's, another committer spent a few minutes and committed a
patch so builds no longer have to be done on dev machines and can instead
come off Apache Infra and now version string has apache infra in it
instead... nice).



 You know how I'd have raised this? I'd say Hey guys, seems like the build
 messages have hortonmu and that seems like an issue with our branding. Can
 we fix this?. Then I or somebody could have replied Oh, that seems
 totally by mistake. Agreed, let's fix this.


Ain't this what I did give or take a bit on the wording?



 Instead, you post it in another orthogonal thread (which in itself is
 making claims of causing deliberate confusion of brand), make it look like
 an innocuous question asking if apache binary has to be compiled by the
 specific user-name.


Sorry. Seemed related to me at the time at least.  I was trying out tip of
the branch and the color made me 'sensitive' and then I tripped over the
version string (Its hard to miss being up top in our UI).


 I said 'unbelievable'. Sorry, I should have used 'disappointing'. This is
 not the way I'd post 'concerns'.


You should make up your mind.  When you waffle on your dramatic lead-in,
the 'unbelievable' becoming 'disappointing', it reads like a 'device'.
 Your reaction comes across as false, artificial, not genuine.  Just
saying...



 There is a reason why brand issues are gently discussed on private lists.
 And to think this thread is posted out in the open like this, it was me who
 was taken aback by your oh-not-so-explicit insinuations.


I do not apologize for thinking us as a community mature enough to answer a
basic it looks like X to me, what do you lot think? even if X might come
close to the bone for some of us involved here.  A simple no, you are way
off or you may have a point... and variants thereof was what I was
expecting (You did this up in the related issue, thanks for doing that, but
IMO it would have been more effective if you'd done it in this thread...).

Thanks Vinod,
St.Ack

Re: Issue with my username on my company provided dev box? (Was: …)

2014-01-30 Thread Stack

On Wed, Jan 29, 2014 at 7:31 PM, Arun C Murthy a...@hortonworks.com wrote:


 Stack,

  Apologies for the late response, I just saw this.

 On Jan 29, 2014, at 3:33 PM, Stack st...@duboce.net wrote:

 Slightly related, I just ran into this looking back at my 2.2.0 download:

 [stack@c2020 hadoop-2.2.0]$ ./bin/hadoop version
 Hadoop 2.2.0
 Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
 Compiled by hortonmu on 2013-10-07T06:28Z
 ...

 Does the apache binary have to be compiled by 'hortonmu'?   Could it be
 compiled by 'arun', or 'apachemu'?

 Thanks,
 St.Ack


  Thank you for tarring all my work here with a brush by insinuating not
 sure what for using my company provided dev machine to work on Hadoop.



What is up Arun?  A basic query gets warped unrecognizably and given a
dirty taint.  This in spite of us 'knowing' each other and having worked
together for years now on this stuff.

While it is true that I changed employers recently, my allegiance when
around these parts has always been to Apache. I've had a few different
employers during my time contributing to Hadoop (This is my 4th). I
challenge you  to find anything in my record that has me rah rah-ing my
current or previous employers.



  I'll try find a non-company provided dev machine to create future builds,
 it might take some time because I'll have to go purchase another one. Or,
 maybe, another option is to legally change my name.


Lets chat before you go to such a radical extreme.  In another thread, it
is implied that changing the build user would be a simple enough affair --
but I know nothing of your infrastructure.



  Meanwhile, while we are on this topic, I just did:

  $ git clone git://git.apache.org/hbase.git
  $ grep -ri cloudera *

  Should I file a jira to fix all refs including the following imports of
 org.cloudera.* (pasted below) … can you please help fix that? There are
 more, but I'll leave it to your discretion. Compared to my username on my
 company provided dev. box, this seems far more egregious. Do you agree?


This is another project including a third-party project.  That seems like a
tenuous connection to me but no problem if you would put it in the same
bucket.  We can file an issue, or probably better as a precursor, get to a
place where we can discuss these concerns in a civil manner and then file
the agreed-upon issues to fix (We need this lib to add tracing to hdfs IMO
-- so if this is in the way of its making it in, lets fix).



  In future, it might be useful to focus our efforts on moving the project
 forward by contributing/reviewing code/docs etc., rather than on petty
 things like usernames.


This is a common theme, that the issues I raise are 'petty' or 'trifles'
but I have trouble reconciling this assertion with the counter reaction
raised. You react as though I were throwing molotov cocktails.

Thanks,
St.Ack

[jira] [Resolved] (HDFS-5852) Change the colors on the hdfs UI

2014-01-30 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HDFS-5852.
-

Resolution: Later

 Change the colors on the hdfs UI
 

 Key: HDFS-5852
 URL: https://issues.apache.org/jira/browse/HDFS-5852
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
  Labels: webui
 Fix For: 2.3.0

 Attachments: HDFS-5852.best.txt, HDFS-5852v2.txt, 
 HDFS-5852v3-dkgreen.txt, color-rationale.png, compromise_gray.png, 
 dkgreen.png, hdfs-5852.txt, new_hdfsui_colors.png


 The HDFS UI colors are too close to HWX green.
 Here is a patch that steers clear of vendor colors.
 I made it a blocker thinking this something we'd want to fix before we 
 release apache hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Re: Issue with my username on my company provided dev box? (Was: …)

2014-01-30 Thread Stack

Thanks Mohammed for the suggestion though I will say you must have a bit of
a perverse streak if you consider this 'enjoyment' -- smile.

Going back to the issue of username in our version string, it looks like
Arun won't have to buy a new machine after all. HADOOP-10313 just got
checked in, a script to build release bits up on our shared Apache
infrastructure.

Yours,
St.Ack




On Thu, Jan 30, 2014 at 4:45 PM, Mohammad Islam misla...@yahoo.com wrote:

 I was enjoying this discussion from the
 sideline.

 I strongly believe the issue could be resolved
 through in-person discussion of the related parties and move forward.
 After that meeting, a synopsis email could be sent
 if that would help and fit the bigger community.

 Regards,
 Mohammad



 On Thursday, January 30, 2014 11:32 AM, Stack st...@duboce.net wrote:

 On Wed, Jan 29, 2014 at 7:31 PM, Arun C Murthy a...@hortonworks.com
 wrote:

 
  Stack,
 
   Apologies for the late response, I just saw this.
 
  On Jan 29, 2014, at 3:33 PM, Stack st...@duboce.net wrote:
 
  Slightly related, I just ran into this looking back at my 2.2.0 download:
 
  [stack@c2020 hadoop-2.2.0]$ ./bin/hadoop version
  Hadoop 2.2.0
  Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
  Compiled by hortonmu on 2013-10-07T06:28Z
  ...
 
  Does the apache binary have to be compiled by 'hortonmu'?   Could it be
  compiled by 'arun', or 'apachemu'?
 
  Thanks,
  St.Ack
 
 
   Thank you for tarring all my work here with a brush by insinuating not
  sure what for using my company provided dev machine to work on Hadoop.
 
 

 What is up Arun?  A basic query gets warped unrecognizably and given a
 dirty taint.  This in spite of us 'knowing' each other and having worked
 together for years now on this stuff.

 While it is true that I changed employers recently, my allegiance when
 around these parts has always been to Apache. I've had a few different
 employers during my time contributing to Hadoop (This is my 4th). I
 challenge you  to find anything in my record that has me rah rah-ing my
 current or previous employers.



   I'll try find a non-company provided dev machine to create future
 builds,
  it might take some time because I'll have to go purchase another one. Or,
  maybe, another option is to legally change my name.
 
 
 Lets chat before you go to such a radical extreme.  In another thread, it
 is implied that changing the build user would be a simple enough affair --
 but I know nothing of your infrastructure.



   Meanwhile, while we are on this topic, I just did:
 
   $ git clone git://git.apache.org/hbase.git
   $ grep -ri cloudera *
 
   Should I file a jira to fix all refs including the following imports of
  org.cloudera.* (pasted below) … can you please help fix that? There are
  more, but I'll leave it to your discretion. Compared to my username on my
  company provided dev. box, this seems far more egregious. Do you agree?
 
 
 This is another project including a third-party project.  That seems like a
 tenuous connection to me but no problem if you would put it in the same
 bucket.  We can file an issue, or probably better as a precursor, get to a
 place where we can discuss these concerns in a civil manner and then file
 the agreed-upon issues to fix (We need this lib to add tracing to hdfs IMO
 -- so if this is in the way of its making it in, lets fix).



   In future, it might be useful to focus our efforts on moving the project
  forward by contributing/reviewing code/docs etc., rather than on petty
  things like usernames.
 
 
 This is a common theme, that the issues I raise are 'petty' or 'trifles'
 but I have trouble reconciling this assertion with the counter reaction
 raised. You react as though I were throwing molotov cocktails.


 Thanks,
 St.Ack

Is it me or is the new bootstrap HDFS UI HWX green? (See tip of hadoop-2.3 branch)

2014-01-29 Thread Stack

What do others think?  See here if you do not have access:
http://goo.gl/j05wkf

It might be a shade darker but I can't tell for sure.  It looks way too
close to me.

I'd think we'd intentionally go out of our way to put a vendor's signature
color on our Apache software.

Asking here before I file a blocker in case it just a case of color
blindness on my part.

Thanks,
St.Ack

Re: Is it me or is the new bootstrap HDFS UI HWX green? (See tip of hadoop-2.3 branch)

2014-01-29 Thread Stack

On Wed, Jan 29, 2014 at 3:01 PM, Stack st...@duboce.net wrote:

 What do others think?  See here if you do not have access:
 http://goo.gl/j05wkf

 It might be a shade darker but I can't tell for sure.  It looks way too
 close to me.

 I'd think we'd intentionally go out of our way to put a vendor's signature
 color on our Apache software.

 Asking here before I file a blocker in case it just a case of color
 blindness on my part.


Slightly related, I just ran into this looking back at my 2.2.0 download:

[stack@c2020 hadoop-2.2.0]$ ./bin/hadoop version
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
...

Does the apache binary have to be compiled by 'hortonmu'?   Could it be
compiled by 'arun', or 'apachemu'?

Thanks,
St.Ack

[jira] [Created] (HDFS-5852) Change the colors on the hdfs UI

2014-01-29 Thread stack (JIRA)

stack created HDFS-5852:
---

 Summary: Change the colors on the hdfs UI
 Key: HDFS-5852
 URL: https://issues.apache.org/jira/browse/HDFS-5852
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: stack
Priority: Blocker
 Fix For: 2.3.0


The HDFS UI colors are too close to HWX green.

Here is a patch that steers clear of vendor colors.

I made it a blocker thinking this something we'd want to fix before we release 
apache hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Re: Re-swizzle 2.3

2014-01-29 Thread Stack

I filed https://issues.apache.org/jira/browse/HDFS-5852 as a blocker.  See
what ye all think.

Thanks,
St.Ack


On Wed, Jan 29, 2014 at 3:52 PM, Aaron T. Myers a...@cloudera.com wrote:

 I just filed this JIRA as a blocker for 2.3:
 https://issues.apache.org/jira/browse/HADOOP-10310

 The tl;dr is that JNs will not work with security enabled without this fix.
 If others don't think that supporting QJM with security enabled warrants a
 blocker for 2.3, then we can certainly lower the priority, but it seems
 pretty important to me.

 Best,
 Aaron

 --
 Aaron T. Myers
 Software Engineer, Cloudera


 On Wed, Jan 29, 2014 at 6:24 PM, Andrew Wang andrew.w...@cloudera.com
 wrote:

  I just finished tuning up branch-2.3 and fixing up the HDFS and Common
  CHANGES.txt in trunk, branch-2, and branch-2.3. I had to merge back a few
  JIRAs committed between the swizzle and now where the fix version was 2.3
  but weren't in branch-2.3.
 
  I think the only two HDFS and Common JIRAs that are marked for 2.4 are
  these:
 
  HDFS-5842 Cannot create hftp filesystem when using a proxy user ugi and a
  doAs on a secure cluster
  HDFS-5781 Use an array to record the mapping between FSEditLogOpCode and
  the corresponding byte value
 
  Jing, these both look safe to me if you want to merge them back, or I can
  just do it.
 
  Thanks,
  Andrew
 
  On Wed, Jan 29, 2014 at 1:21 PM, Doug Cutting cutt...@apache.org
 wrote:
  
   On Wed, Jan 29, 2014 at 12:30 PM, Jason Lowe jl...@yahoo-inc.com
  wrote:
 It is a bit concerning that the JIRA history showed that the target
  version
was set at some point in the past but no record of it being cleared.
  
   Perhaps the version itself was renamed?
  
   Doug

Re: Is it me or is the new bootstrap HDFS UI HWX green? (See tip of hadoop-2.3 branch)

2014-01-29 Thread Stack

On Wed, Jan 29, 2014 at 4:09 PM, Alejandro Abdelnur t...@cloudera.comwrote:

 IMO we should't be distributing binaries. And if we do so,they should be
 built by a Jenkins job.


That would address the second item above.

I filed https://issues.apache.org/jira/browse/HDFS-5852 for the color issue
with a suggested alternative color scheme.

St.Ack

Re: Is it me or is the new bootstrap HDFS UI HWX green? (See tip of hadoop-2.3 branch)

2014-01-29 Thread Stack

On Wed, Jan 29, 2014 at 8:48 PM, Suresh Srinivas sur...@hortonworks.comwrote:

   Please be more civil in your communique.  Your attack dog 'flair' has
  likely ruined my little survey.  No one is going to comment afraid that
  they'll get their heads cut off.
 

 Right next to imploring civil communique, you add a ad hominem.



Do you seriously want to pursue this out in a public dev forum? It is
entertaining I'm sure, if you are not involved, but I for one am a little
disturbed by how this thread has gone and would like to leave it tout de
suite (For those interested, a issue was filed off this thread where the
back and forth has been more civil, and constructive, than what you see out
here: https://issues.apache.org/jira/browse/HDFS-5852).

I can call you offline if you would like to duke this out especially given
I have a different opinion on who started up the ad hominem. On your hopes
that I'll 'cleanup other projects', I don't have the stomach for it.

Thanks,
St.Ack

Re: Is it me or is the new bootstrap HDFS UI HWX green? (See tip of hadoop-2.3 branch)

2014-01-29 Thread Stack

On Wed, Jan 29, 2014 at 9:07 PM, Joe Bounour jboun...@ddn.com wrote:

 Hello

 I find fascinating how all the HWX folks jumped on Stack (Not taking any
 side, I am Switzerland/french), many against one
 As a developer, it seems not a relevant topic, true but to be fair,
 Hortonwork, Cloudera claims most contributors or other fame so I can see a
 point where a little sensitivity in staying neutral as much as possible is
 not a bad idea.

 If the build is tagged Cloudera-foo, I can imagine the same email
 explosion with other folks.
 So true, it is not super relevant but true a little neutrality is not that
 bad

 My 2 cents, now Shoot me... :)


Thanks for the view from the outside Joe.  We are usually better behaved
that what you've seen here.
St.Ack

Re: [VOTE] Release Apache Hadoop 2.1.0-beta

2013-08-21 Thread Stack

On Wed, Aug 21, 2013 at 1:25 PM, Colin McCabe cmcc...@alumni.cmu.eduwrote:

 St.Ack wrote:

  + Once I figured where the logs were, found that JAVA_HOME was not being
  exported (don't need this in hadoop-2.0.5 for instance).  Adding an
  exported JAVA_HOME to my running shell which don't seem right but it took
  care of it (I gave up pretty quick on messing w/
  yarn.nodemanager.env-whitelist and yarn.nodemanager.admin-env -- I wasn't
  getting anywhere)

 I thought that we were always supposed to have JAVA_HOME set when
 running any of these commands.  At least, I do.  How else can the
 system disambiguate between different Java installs?  I need 2
 installs to test with JDK7.



That is fair enough but I did not need to define this explicitly previously
(for hadoop-2.0.5-alpha for instance) or the JAVA_HOME that was figured in
start scripts was propagated and now is not (I have not dug in).



  + This did not seem to work for me:
  namehadoop.security.group.mapping/name
 
 valueorg.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback/va
  lue.

 We've seen this before.  I think your problem is that you have
 java.library.path set correctly (what System.loadLibrary checks), but
 your system library path does not include a necessary dependency of
 libhadoop.so-- most likely, libjvm.so.  Probably, we should fix
 NativeCodeLoader to actually make a function call in libhadoop.so
 before it declares everything OK.


My expectation was that if native group lookup fails, as it does here, then
the 'Fallback' would kick in and we'd do the Shell query.  This mechanism
does not seem to be working.


St.Ack

Re: [VOTE] Release Apache Hadoop 2.1.0-beta

2013-08-20 Thread Stack

On Thu, Aug 15, 2013 at 2:15 PM, Arun C Murthy a...@hortonworks.com wrote:

Folks,

I've created a release candidate (rc2) for hadoop-2.1.0-beta that I would
like to get released - this fixes the bugs we saw since the last go-around
(rc1).

The RC is available at:
http://people.apache.org/~acmurthy/hadoop-2.1.0-beta-rc2/
The RC tag in svn is here:
http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.1.0-beta-rc2

The maven artifacts are available via repository.apache.org.

Please try the release and vote; the vote will run for the usual 7 days.

It basically works (in insecure mode), +1.

+ Checked signature.
+ Ran on small cluster w/ small load made using mapreduce interfaces.
+ Got the HBase full unit test suite to pass on top of it.

I had the following issues getting it to all work. I don't know if they are
known issues so will just list them here first.

+ I could not find documentation on how to go from tarball to running
cluster (the bundled 'cluster' and 'standalone' doc are not about how to
get this tarball off the ground).
+ I had a bit of a struggle putting this release in place under hbase unit
tests. The container would just exit w/ 127 errcode. No logs in expected
place. Tripped over where minimrcluster was actually writing. Tried to
corral it so it played nicely w/o our general test setup but found that the
new mini clusters have 'target' hardcoded as output dirs.
+ Once I figured where the logs were, found that JAVA_HOME was not being
exported (don't need this in hadoop-2.0.5 for instance). Adding an
exported JAVA_HOME to my running shell which don't seem right but it took
care of it (I gave up pretty quick on messing w/
yarn.nodemanager.env-whitelist and yarn.nodemanager.admin-env -- I wasn't
getting anywhere)
+ This did not seem to work for me:
namehadoop.security.group.mapping/name
valueorg.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback/value.
It just did this:

Caused by: java.lang.UnsatisfiedLinkError:
org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V
at org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native
Method)
at
org.apache.hadoop.security.JniBasedUnixGroupsMapping.clinit(JniBasedUnixGroupsMapping.java:49)
at
org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.init(JniBasedUnixGroupsMappingWithFallback.java:38)

..so I replaced it
w/ org.apache.hadoop.security.ShellBasedUnixGroupsMapping on the hbase-side
to get my cluster up and running.

+ Untarring the bin dir, it undoes as hadoop-X.Y.Z-beta. Undoing the src
dir it undoes as hadoop-X.Y.Z-beta-src. I'd have thought they would undo
into the one directory overlaying each other.

St.Ack

Re: I'm interested in working with HDFS-4680. Can somebody be a mentor?

2013-07-17 Thread Stack

Folks over at HBase would be interested in helping out.

What does a mentor have to do?  I poked around the icfoss link but didn't
see list of duties (I've been know to be certified blind on occasion).

I am not up on the malleability of hdfs RPC; is it just a matter of adding
the trace info to a pb header record or would it require more (Sanjay was
saying something recently off-list that trace id is imminent -- but I've
not done the digging)?

St.Ack


On Wed, Jul 17, 2013 at 1:44 PM, Sreejith Ramakrishnan 
sreejith.c...@gmail.com wrote:

 Hey,

 I was originally researching options to work on ACCUMULO-1197. Basically,
 it was a bid to pass trace functionality through the DFSClient. I discussed
 with the guys over there on implementing a Google Dapper-style trace with
 HTrace. The guys at HBase are also trying to achieve the same HTrace
 integration [HBASE-6449]

 But, that meant adding stuff to the RPC in HDFS. For a start, we've to add
 a 64-bit span-id to every RPC with tracing enabled. There's some more in
 the original Dapper paper and HTrace documentation.

 I was told by the Accumulo people to talk with and seek help from the
 experts at HDFS. I'm open to suggestions.

 Additionally, I'm participating in a Joint Mentoring Programme by Apache
 which is quite similar to GSoC. Luciano Resende (Community Development,
 Apache) is incharge of the programme. I'll attach a link. The last date is
 19th July. So, I'm pretty tensed without any mentors :(

 [1] https://issues.apache.org/jira/browse/ACCUMULO-1197
 [2] https://issues.apache.org/jira/browse/HDFS-4680
 [3] https://github.com/cloudera/htrace
 [4] http://community.apache.org/mentoringprogramme-icfoss-pilot.html
 [5] https://issues.apache.org/jira/browse/HBASE-6449

 Thank you,
 Sreejith R

[jira] [Created] (HDFS-4580) 0.95 site build failing with 'maven-project-info-reports-plugin: Could not find goal 'dependency-info''

2013-03-08 Thread stack (JIRA)

stack created HDFS-4580:
---

 Summary: 0.95 site build failing with 
'maven-project-info-reports-plugin: Could not find goal 'dependency-info''
 Key: HDFS-4580
 URL: https://issues.apache.org/jira/browse/HDFS-4580
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: stack


Our report plugin is 2.4.  Says that 'dependency-info' is new since 2.5 on the 
mvn report page:


project-info-reports:dependency-info (new in 2.5) is used to generate code 
snippets to be added to build tools.

http://maven.apache.org/plugins/maven-project-info-reports-plugin/

Let me try upgrading our reports plugin.  I tried reproducing locally running 
same mvn version but it just works for me.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: VOTE: HDFS-347 merge

2013-02-17 Thread Stack

+1


On Sun, Feb 17, 2013 at 1:48 PM, Colin McCabe cmcc...@alumni.cmu.eduwrote:

 Hi all,

 I would like to merge the HDFS-347 branch back to trunk.  It's been
 under intensive review and testing for several months.  The branch
 adds a lot of new unit tests, and passes Jenkins as of 2/15 [1]

 We have tested HDFS-347 with both random and sequential workloads. The
 short-circuit case is substantially faster [2], and overall
 performance looks very good.  This is especially encouraging given
 that the initial goal of this work was to make security compatible
 with short-circuit local reads, rather than to optimize the
 short-circuit code path.  We've also stress-tested HDFS-347 on a
 number of clusters.

 This iniial VOTE is to merge only into trunk.  Just as we have done
 with our other recent merges, we will consider merging into branch-2
 after the code has been in trunk for few weeks.

 Please cast your vote by EOD Sunday 2/24.

 best,
 Colin McCabe

 [1]
 https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13579704page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13579704

 [2]
 https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13551755page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13551755

Re: Release numbering for branch-2 releases

2013-02-04 Thread Stack

On Mon, Feb 4, 2013 at 10:46 AM, Arun C Murthy a...@hortonworks.com wrote:

 Would it better to have 2.0.3-alpha, 2.0.4-beta and then make 2.1 as a
 stable release? This way we just have one series (2.0.x) which is not
 suitable for general consumption.



That contains the versioning damage to the 2.0.x set.  This is an
improvement over the original proposal where we let the versioning mayhem
run out 2.3.

Thanks Arun,
St.Ack

Re: Release numbering for branch-2 releases

2013-02-02 Thread Stack

On Fri, Feb 1, 2013 at 3:03 AM, Tom White t...@cloudera.com wrote:

 On Wed, Jan 30, 2013 at 11:32 PM, Vinod Kumar Vavilapalli
 vino...@hortonworks.com wrote:
  I still have a list of pending API/protocol cleanup in YARN that need to
 be
  in before we even attempt supporting compatibility further down the road.


YARN requires changing HDFS/MapReduce API/wire-protocol?  Can't it be done
in hadoop 3.x?



 Just caught up with the discussion on the referred JIRAs. I can clearly
 see
  how a single release with an umbrella alpha/beta tag is causing tensions
  *only* because we have a single project and product. More reinforcement
 for
  my proclivity towards separate releases and by extension towards the
  projects' split.

 Good point. There's nothing to stop us doing separate releases of
 sub-project components now. Doing so might help us find
 incompatibilities between the different components in a release line
 (2.x at the moment).



I like the sound of this.  So, if HDFS, say, went unscathed by the higher
level API and wire-protocol machinations, it could make its way out to a
2.0.0 (or 2.0.4) absent the -beta/-alpha tails?

That'd help us downstreamers (As is, just trying to explain our now
out-of-date hadoop dependency is a couple of pages of the hbase reference
guide [1] -- and we haven't started in on how you'd run against hadoop2).

Thanks,
St.Ack
1. http://hbase.apache.org/book.html#hadoop

Re: Release numbering for branch-2 releases

2013-01-30 Thread Stack

On Tue, Jan 29, 2013 at 12:56 PM, Arun C Murthy a...@hortonworks.com wrote:

 Folks,

  There has been some discussions about incompatible changes in the
 hadoop-2.x.x-alpha releases on HADOOP-9070, HADOOP-9151, HADOOP-9192 and
 few other jiras. Frankly, I'm surprised about some of them since the
 'alpha' moniker was precisely to harden apis by changing them if necessary,
 borne out by the fact that every  single release in hadoop-2 chain has had
 incompatible changes. This happened since we were releasing early, moving
 fast and breaking things. Furthermore, we'll have more in future as move
 towards stability of hadoop-2 similar to HDFS-4362, HDFS-4364 et al in HDFS
 and YARN-142 (api changes) for YARN.

  So, rather than debate more, I had a brief chat with Suresh and Todd.
 Todd suggested calling the next release as hadoop-2.1.0-alpha to indicate
 the incompatibility a little better. This makes sense to me, as long as we
 are clear that we won't make any further *feature* releases in hadoop-2.0.x
 series (obviously we might be forced to do security/bug-fix release).

  Going forward, I'd like to start locking down apis/protocols for a 'beta'
 release. This way we'll have one *final* opportunity post
 hadoop-2.1.0-alpha to make incompatible changes if necessary and we can
 call it hadoop-2.2.0-beta.

  Post hadoop-2.2.0-beta we *should* lock down and not allow incompatible
 changes. This will allow us to go on to a hadoop-2.3.0 as a GA release.
 This forces us to do a real effort on making sure we lock down for
 hadoop-2.2.0-beta.

  In summary:
  # I plan to now release hadoop-2.1.0-alpha (this week).
  # We make a real effort to lock down apis/protocols and release
 hadoop-2.2.0-beta, say in March.
  # Post 'beta' release hadoop-2.3.0 as 'stable' sometime in May.

  I'll start a separate thread on 'locking protocols' w.r.t
 client-protocols v/s internal protocols (to facilitate rolling upgrades
 etc.), let's discuss this one separately.

  Makes sense?



No.

I find the above opaque and written in a cryptic language that I might grok
if I spent a day or two running over cited issues trying to make some
distillation of the esotericia debated therein.  If you want feedback from
other than the cognescenti, I would suggest a better summation of what all
is involved.  I think jargon is fine for arcane technical discussion but it
seems we are talking basic hadoop versioning here and if I am following at
all, we are talking about possibly breaking API (?) and even wire protocol
inside a major version: i.e. between 2.0.x to 2.3.x say (give or take an
-alpha or -beta suffix thrown in here and there).  Does this have to be?
 Can't we do API changes and wire protocol change off in hadoop 3.x and
4.x, etc.  As is, how is a little ol' downstream project like the one I
work on supposed to cope w/ this plethora of 2.X.X-{alpha,beta,?} with no
each new 2.x possibly a whole new 'experience'?

Thanks Arun,
St.Ack

[jira] [Created] (HDFS-4239) Means of telling the datanode to stop using a sick disk

2012-11-29 Thread stack (JIRA)

stack created HDFS-4239:
---

 Summary: Means of telling the datanode to stop using a sick disk
 Key: HDFS-4239
 URL: https://issues.apache.org/jira/browse/HDFS-4239
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: stack


If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing 
occasionally, or just exhibiting high latency -- your choices are:

1. Decommission the total datanode.  If the datanode is carrying 6 or 12 disks 
of data, especially on a cluster that is smallish -- 5 to 20 nodes -- the 
rereplication of the downed datanode's data can be pretty disruptive, 
especially if the cluster is doing low latency serving: e.g. hosting an hbase 
cluster.

2. Stop the datanode, unmount the bad disk, and restart the datanode (You can't 
unmount the disk while it is in use).  This latter is better in that only the 
bad disk's data is rereplicated, not all datanode data.

Is it possible to do better, say, send the datanode a signal to tell it stop 
using a disk an operator has designated 'bad'.  This would be like option #2 
above minus the need to stop and restart the datanode.  Ideally the disk would 
become unmountable after a while.

Nice to have would be being able to tell the datanode to restart using a disk 
after its been replaced.





--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-4203) After recoverFileLease, datanode gets stuck complaining block '...has out of data GS ....may already be committed'

2012-11-16 Thread stack (JIRA)

stack created HDFS-4203:
---

 Summary: After recoverFileLease, datanode gets stuck complaining 
block '...has out of data GS may already be committed'
 Key: HDFS-4203
 URL: https://issues.apache.org/jira/browse/HDFS-4203
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.1.0
Reporter: stack


After calling recoverFileLease, an append to a file gets stuck retying this:

{code}
2012-11-16 13:06:14,298 DEBUG [IPC Server handler 2 on 53224] 
namenode.PendingReplicationBlocks(92): Removing pending replication for 
blockblk_-3222397051272483489_1006
2012-11-16 13:06:43,881 WARN  [DataStreamer for file /hbase/hlog/hlog.dat.2 
block blk_-3222397051272483489_1003] hdfs.DFSClient$DFSOutputStream(3216): 
Error Recovery for block blk_-3222397051272483489_1003 bad datanode[0] 
127.0.0.1:53228
2012-11-16 13:06:43,881 WARN  [DataStreamer for file /hbase/hlog/hlog.dat.2 
block blk_-3222397051272483489_1003] hdfs.DFSClient$DFSOutputStream(3267): 
Error Recovery for block blk_-3222397051272483489_1003 in pipeline 
127.0.0.1:53228, 127.0.0.1:53231: bad datanode 127.0.0.1:53228
2012-11-16 13:06:43,884 INFO  [IPC Server handler 1 on 53233] 
datanode.DataNode(2123): Client calls 
recoverBlock(block=blk_-3222397051272483489_1003, targets=[127.0.0.1:53231])
2012-11-16 13:06:43,884 DEBUG [IPC Server handler 1 on 53233] 
datanode.FSDataset(2143): Interrupting active writer threads for block 
blk_-3222397051272483489_1006
2012-11-16 13:06:43,884 DEBUG [IPC Server handler 1 on 53233] 
datanode.FSDataset(2159): getBlockMetaDataInfo successful 
block=blk_-3222397051272483489_1006 length 120559 genstamp 1006
2012-11-16 13:06:43,884 DEBUG [IPC Server handler 1 on 53233] 
datanode.DataNode(2039): block=blk_-3222397051272483489_1003, (length=120559), 
syncList=[BlockRecord(info=BlockRecoveryInfo(block=blk_-3222397051272483489_1006
 wasRecoveredOnStartup=false) node=127.0.0.1:53231)], closeFile=false
2012-11-16 13:06:43,885 INFO  [IPC Server handler 2 on 53224] 
namenode.FSNamesystem(5468): blk_-3222397051272483489_1003 has out of date GS 
1003 found 1006, may already be committed
2012-11-16 13:06:43,885 ERROR [IPC Server handler 2 on 53224] 
security.UserGroupInformation(1139): PriviledgedActionException as:stack 
cause:java.io.IOException: blk_-3222397051272483489_1003 has out of date GS 
1003 found 1006, may already be committed
2012-11-16 13:06:43,885 ERROR [IPC Server handler 1 on 53233] 
security.UserGroupInformation(1139): PriviledgedActionException 
as:blk_-3222397051272483489_1003 cause:org.apache.hadoop.ipc.RemoteException: 
java.io.IOException: blk_-3222397051272483489_1003 has out of date GS 1003 
found 1006, may already be committed
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:5469)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:781)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

2012-11-16 13:06:43,886 WARN  [DataStreamer for file /hbase/hlog/hlog.dat.2 
block blk_-3222397051272483489_1003] hdfs.DFSClient$DFSOutputStream(3292): 
Failed recovery attempt #1 from primary datanode 127.0.0.1:53231
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.ipc.RemoteException: 
java.io.IOException: blk_-3222397051272483489_1003 has out of date GS 1003 
found 1006, may already be committed
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.nextGenerationStampForBlock(FSNamesystem.java:5469)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.nextGenerationStamp(NameNode.java:781)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java

[jira] [Resolved] (HDFS-4184) Add new interface for Client to provide more information

2012-11-12 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HDFS-4184.
-

Resolution: Invalid

Resolving invalid as not enough detail.

The JIRA subject and description do not seem to match.  As per Ted in previous 
issue, please add more detail when you create issue so we can know better to 
what you refer.  Meantime I'm closing this.  Open a new one when better 
specification (this seems to require a particular version of hadoop, etc.).

Thanks Binlijin.

 Add new interface for Client to provide more information
 

 Key: HDFS-4184
 URL: https://issues.apache.org/jira/browse/HDFS-4184
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: binlijin

 When hbase read or write hlog we can use 
 dfs.datanode.drop.cache.behind.reads、dfs.datanode.drop.cache.behind.writes, 
 when hbase read hfile during compaction we can use readahead and so on... 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HDFS-4184) Add the ability for Client to provide more hint information for DataNode to manage the OS buffer cache more accurate

2012-11-12 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HDFS-4184:
-


Here, I reopened it for you (in case you can't)

 Add the ability for Client to provide more hint information for DataNode to 
 manage the OS buffer cache more accurate
 

 Key: HDFS-4184
 URL: https://issues.apache.org/jira/browse/HDFS-4184
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: binlijin

 HDFS now has the ability to use posix_fadvise and sync_data_range syscalls to 
 manage the OS buffer cache.
 {code}
 When hbase read hlog the data we can set dfs.datanode.drop.cache.behind.reads 
 to true to drop data out of the buffer cache when performing sequential reads.
 When hbase write hlog we can set dfs.datanode.drop.cache.behind.writes to 
 true to drop data out of the buffer cache after writing
 When hbase read hfile during compaction we can set 
 dfs.datanode.readahead.bytes to a non-zero value to trigger readahead for 
 sequential reads.
 and so on... 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [VOTE] Merge HDFS-3077 (QuorumJournalManager) branch to trunk

2012-09-27 Thread Stack

On Wed, Sep 26, 2012 at 4:21 PM, Konstantin Shvachko
shv.had...@gmail.com wrote:
 Don't understand your argument. Else where?

You suggest users should download HDFS and then go to another project
(or subproject) -- i.e. 'elsewhere' -- to get a fundamental, a fix for
the SPOF.  IMO, the SPOF-fix belongs in HDFS core.

St.Ack

Re: [VOTE] Merge HDFS-3077 (QuorumJournalManager) branch to trunk

2012-09-26 Thread Stack

On Tue, Sep 25, 2012 at 11:21 PM, Konstantin Shvachko
shv.had...@gmail.com wrote:
 I think this is a great work, Todd.
 And I think we should not merge it into trunk or other branches.
 As I suggested earlier on this list I think this should be spinned off
 as a separate project or a subproject.


I'd be -1 on that.

Users shouldn't have to go elsewhere to get a fix for SPOF.

St.Ack

[jira] [Created] (HDFS-2296) If read error while lease is being recovered, client reverts to stale view on block info

2011-08-29 Thread stack (JIRA)

If read error while lease is being recovered, client reverts to stale view on 
block info


 Key: HDFS-2296
 URL: https://issues.apache.org/jira/browse/HDFS-2296
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.20-append, 0.22.0, 0.23.0
Reporter: stack
Priority: Critical


We are seeing the following issue around recoverLease over in hbaselandia.  
DFSClient calls recoverLease to assume ownership of a file.  The recoverLease 
returns to the client but it can take time for the new state to propagate.  
Meantime, an incoming read fails though its using updated block info.  
Thereafter all read retries fail because on exception we revert to stale block 
view and we never recover.  Laxman reports this issue in the below mailing 
thread:

See this thread for first report of this issue: 
http://search-hadoop.com/m/S1mOHFRmgk2/%2527FW%253A+Handling+read+failures+during+recovery%2527subj=FW+Handling+read+failures+during+recovery

Chatting w/ Hairong offline, she suggests this a general issue around lease 
recovery no matter how it triggered (new recoverLease or not).

I marked this critical.  At least over in hbase it is since we get set stuck 
here recovering a crashed server.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-1948) Forward port 'hdfs-1520 lightweight namenode operation to trigger lease reccovery'

2011-05-16 Thread stack (JIRA)

Forward port 'hdfs-1520 lightweight namenode operation to trigger lease 
reccovery'
--

 Key: HDFS-1948
 URL: https://issues.apache.org/jira/browse/HDFS-1948
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: stack


This issue is about forward porting from branch-0.20-append the little namenode 
api that facilitates stealing of a file's lease.  The forward port would be an 
adaption of hdfs-1520 and its companion patches, hdfs-1555 and hdfs-1554, to 
suit the TRUNK.

Intent is to get this fix into 0.22 time willing; i'll run a vote to get ok on 
getting it added to branch.  HBase needs this facility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Please join me in welcoming the following people as committers to the Hadoop project

2011-01-05 Thread Stack

Congrats lads.
St.Ack

On Wed, Jan 5, 2011 at 7:40 PM, Ian Holsman had...@holsman.net wrote:
 On behalf of the Apache Hadoop PMC, I would like to extend a warm welcome to 
 the following people,
 who have all chosen to accept the role of committers on Hadoop.

 In no alphabetical order:

 - Aaron Kimball
 - Allen Wittenauer
 - Amar Kamat
 - Dmytro Molkov
 - Jitendra Pandey
 - Kan Zhang
 - Ravi Gummadi
 - Sreekanth Ramakrishna
 - Todd Lipcon

 I appreciate all the hard work these people have put into the project so far, 
 and look forward to future contributions they will make to Hadoop in the 
 future

 Well done guys!


 --Ian

[MOTION PASSED, VOTE CLOSED] WAS - Re: [VOTE] Commit hdfs-1024 to 0.20 branch

2010-04-05 Thread Stack

Thanks to all who participated in the vote.

I'll commit in a minute.

St.Ack


On Fri, Apr 2, 2010 at 10:38 AM, Stack st...@duboce.net wrote:
 Please on committing HDFS-1024 to the hadoop 0.20 branch.

 Background:

 HDFS-1024 fixes possible trashing of fsimage because of failed copy
 from 2NN and NN.  Ordinarily, possible corruption of this proportion
 would merit commit w/o need of a vote only Dhruba correctly notes that
 UNLESS both NN and 2NN are upgraded, HDFS-1024 becomes an incompatible
 change (the NN-2NN communication will fail always).  IMO, this
 incompatible change can be plastered over with a release note; e.g.
 WARNING, you MUST update NN and 2NN when you go to 0.20.3 hadoop.  If
 you agree with me, please vote +1 on commit.

 Thanks,
 St.Ack

[VOTE] Commit hdfs-1024 to 0.20 branch

2010-04-02 Thread Stack

Please on committing HDFS-1024 to the hadoop 0.20 branch.

Background:

HDFS-1024 fixes possible trashing of fsimage because of failed copy
from 2NN and NN.  Ordinarily, possible corruption of this proportion
would merit commit w/o need of a vote only Dhruba correctly notes that
UNLESS both NN and 2NN are upgraded, HDFS-1024 becomes an incompatible
change (the NN-2NN communication will fail always).  IMO, this
incompatible change can be plastered over with a release note; e.g.
WARNING, you MUST update NN and 2NN when you go to 0.20.3 hadoop.  If
you agree with me, please vote +1 on commit.

Thanks,
St.Ack

Re: [VOTE] Commit HDFS-927 to both 0.20 and 0.21 branch?

2010-02-08 Thread Stack

Vote is closed (unless there is objection).  I'll commit below in next
day or so.
Thanks to all who participated.
St.Ack

On Mon, Feb 8, 2010 at 11:26 AM, Todd Lipcon t...@cloudera.com wrote:
 Given people have had several days to vote, and there have been no
 -1s, this should be good to go in, right? We have two HDFS committer
 +1s (Stack and Nicholas) and nonbinding +1s from several others.

 Thanks
 -Todd

 On Thu, Feb 4, 2010 at 1:30 PM, Tsz Wo (Nicholas), Sze
 s29752-hadoop...@yahoo.com wrote:

 This is a friendly reminder for voting on committing HDFD-927 to 0.20 and 
 0.21.

 Comiitters, please vote!

 Nicholas




 - Original Message 
  From: Stack st...@duboce.net
  To: hdfs-dev@hadoop.apache.org
  Sent: Tue, February 2, 2010 10:22:50 PM
  Subject: [VOTE] Commit HDFS-927 to both 0.20 and 0.21 branch?
 
  I'd like to open a vote on committing HDFS-927 to both hadoop branch
  0.20 and to 0.21.
 
  HDFS-927 DFSInputStream retries too many times for new block
  location has an odd summary but in short, its a better HDFS-127
  DFSClient block read failures cause open DFSInputStream to become
  unusable.  HDFS-127 is an old, popular issue that refuses to die.  We
  voted on having it committed to the 0.20 branch not too long ago, see
  http://www.mail-archive.com/hdfs-dev@hadoop.apache.org/msg00401.html,
  only it broke TestFsck (See http://su.pr/1nylUn) so it was reverted.
 
  High-level, HDFS-127/HDFS-927 is about fixing DFSClient so it a good
  read cleans out the failures count (Previous failures 'stuck' though
  there may have been hours of successful reads in betwixt).  When
  rolling hadoop 0.20.2 was proposed, a few fellas including myself
  raised a lack of HDFS-127 as an obstacle.
 
  HDFS-927 has been committed to TRUNK.
 
  I'm +1 on committing to 0.20 and to 0.21 branches.
 
  Thanks for taking the time to take a look into this issue.
  St.Ack

[VOTE] Commit HDFS-927 to both 0.20 and 0.21 branch?

2010-02-02 Thread Stack

I'd like to open a vote on committing HDFS-927 to both hadoop branch
0.20 and to 0.21.

HDFS-927 DFSInputStream retries too many times for new block
location has an odd summary but in short, its a better HDFS-127
DFSClient block read failures cause open DFSInputStream to become
unusable.  HDFS-127 is an old, popular issue that refuses to die.  We
voted on having it committed to the 0.20 branch not too long ago, see
http://www.mail-archive.com/hdfs-dev@hadoop.apache.org/msg00401.html,
only it broke TestFsck (See http://su.pr/1nylUn) so it was reverted.

High-level, HDFS-127/HDFS-927 is about fixing DFSClient so it a good
read cleans out the failures count (Previous failures 'stuck' though
there may have been hours of successful reads in betwixt).  When
rolling hadoop 0.20.2 was proposed, a few fellas including myself
raised a lack of HDFS-127 as an obstacle.

HDFS-927 has been committed to TRUNK.

I'm +1 on committing to 0.20 and to 0.21 branches.

Thanks for taking the time to take a look into this issue.
St.Ack

[VOTE -- Round 2] Commit hdfs-630 to 0.21?

2010-01-21 Thread Stack

I'd like to propose a new vote on having hdfs-630 committed to 0.21.
The first vote on this topic, initiated 12/14/2009, was sunk by Tsz Wo
(Nicholas), Sze suggested improvements. Those suggestions have since
been folded into a new version of the hdfs-630 patch.  Its this new
version of the patch -- 0001-Fix-HDFS-630-0.21-svn-2.patch -- that I'd
like us to vote on. For background on why we -- the hbase community
-- think hdfs-630 important, see the notes below from the original
call-to-vote.

I'm obviously +1.

Thanks for you consideration,
St.Ack

P.S. Regards TRUNK, after chatting with Nicholas, TRUNK was cleaned of
the previous versions of hdfs-630 and we'll likely apply
0001-Fix-HDFS-630-trunk-svn-4.patch, a version of
0001-Fix-HDFS-630-0.21-svn-2.patch that works for TRUNK that includes
the Nicholas suggestions.


On Mon, Dec 14, 2009 at 9:56 PM, stack st...@duboce.net wrote:
 I'd like to propose a vote on having hdfs-630 committed to 0.21 (Its already
 been committed to TRUNK).

 hdfs-630 adds having the dfsclient pass the namenode the name of datanodes
 its determined dead because it got a failed connection when it tried to
 contact it, etc.  This is useful in the interval between datanode dying and
 namenode timing out its lease.  Without this fix, the namenode can often
 give out the dead datanode as a host for a block.  If the cluster is small,
 less than 5 or 6 nodes, then its very likely namenode will give out the dead
 datanode as a block host.

 Small clusters are common in hbase, especially when folks are starting out
 or evaluating hbase.  They'll start with three or four nodes carrying both
 datanodes+hbase regionservers.  They'll experiment killing one of the slaves
 -- datanodes and regionserver -- and watch what happens.  What follows is a
 struggling dfsclient trying to create replicas where one of the datanodes
 passed us by the namenode is dead.   DFSClient will fail and then go back to
 the namenode again, etc. (See
 https://issues.apache.org/jira/browse/HBASE-1876 for more detailed
 blow-by-blow).  HBase operation will be held up during this time and
 eventually a regionserver will shut itself down to protect itself against
 dataloss if we can't successfully write HDFS.

 Thanks all,
 St.Ack

Re: [VOTE CANCELLED] Commit hdfs-630 to 0.21?

2009-12-20 Thread stack

Nicholas reviewed hdfs-630 patch and made some suggestions for improvements.
 Cosmin, the patch writer, obliged.  After chatting with Nicholas and
Cosmin, I will reverse the hdfs-630 patch that is in TRUNK and if the new
patch passes hudson, will apply it instead.  I will then put up a new vote
to have the improved patch applied to 0.21.

Thanks to all who voted.
St.Ack


On Mon, Dec 14, 2009 at 9:56 PM, stack st...@duboce.net wrote:

 I'd like to propose a vote on having hdfs-630 committed to 0.21 (Its
 already been committed to TRUNK).

 hdfs-630 adds having the dfsclient pass the namenode the name of datanodes
 its determined dead because it got a failed connection when it tried to
 contact it, etc.  This is useful in the interval between datanode dying and
 namenode timing out its lease.  Without this fix, the namenode can often
 give out the dead datanode as a host for a block.  If the cluster is small,
 less than 5 or 6 nodes, then its very likely namenode will give out the dead
 datanode as a block host.

 Small clusters are common in hbase, especially when folks are starting out
 or evaluating hbase.  They'll start with three or four nodes carrying both
 datanodes+hbase regionservers.  They'll experiment killing one of the slaves
 -- datanodes and regionserver -- and watch what happens.  What follows is a
 struggling dfsclient trying to create replicas where one of the datanodes
 passed us by the namenode is dead.   DFSClient will fail and then go back to
 the namenode again, etc. (See
 https://issues.apache.org/jira/browse/HBASE-1876 for more detailed
 blow-by-blow).  HBase operation will be held up during this time and
 eventually a regionserver will shut itself down to protect itself against
 dataloss if we can't successfully write HDFS.

 Thanks all,
 St.Ack

[jira] Created: (HDFS-721) ERROR Block blk_XXX_1030 already exists in state RBW and thus cannot be created

2009-10-21 Thread stack (JIRA)

ERROR Block blk_XXX_1030 already exists in state RBW and thus cannot be created
---

 Key: HDFS-721
 URL: https://issues.apache.org/jira/browse/HDFS-721
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.21.0
 Environment: dfs.support.append=true

Current branch-0.21 of hdfs, mapreduce, and common. Here is svn info:

URL: https://svn.apache.org/repos/asf/hadoop/hdfs/branches/branch-0.21
Repository Root: https://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 827883
Node Kind: directory
Schedule: normal
Last Changed Author: szetszwo
Last Changed Rev: 826906
Last Changed Date: 2009-10-20 00:16:25 + (Tue, 20 Oct 2009)
Reporter: stack


Running some loading tests against hdfs branch-0.21 I got the following:

{code}
2009-10-21 04:57:10,770 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving block blk_6345892463926159834_1030 src: /XX.XX.XX.141:53112 dest: 
/XX.XX.XX.140:51010
2009-10-21 04:57:10,771 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
writeBlock blk_6345892463926159834_1030 received exception 
org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
blk_6345892463926159834_1030 already exists in state RBW and thus cannot be 
created.
2009-10-21 04:57:10,771 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(XX.XX.XX.140:51010, 
storageID=DS-1292310101-XX.XX.XX.140-51010-1256100924816, infoPort=51075, 
ipcPort=51020):DataXceiver
org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
blk_6345892463926159834_1030 already exists in state RBW and thus cannot be 
created.
at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.createTemporary(FSDataset.java:1324)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:98)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:258)
at 
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:382)
at 
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:323)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:111)
at java.lang.Thread.run(Thread.java:619)
{code}

On the sender side:

{code}
2009-10-21 04:57:10,740 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(XX.XX.XX.141:51010, 
storageID=DS-1870884070-XX.XX.XX.141-51010-1256100925196, infoPort=51075, 
ipcPort=51020) Starting thread to transfer block blk_6345892463926159834_1030 
to XX.XX.XX.140:51010
2009-10-21 04:57:10,770 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(XX.XX.XX.141:51010, 
storageID=DS-1870884070-XX.XX.XX.141-51010-1256100925196, infoPort=51075, 
ipcPort=51020):Failed to transfer blk_6345892463926159834_1030 to 
XX.XX.XX.140:51010 got java.net.SocketException: Original Exception : 
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:415)
at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:516)
at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:199)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:346)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:434)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1262)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Connection reset by peer
... 8 more
{code}

The block sequence number, 1030, is one more than that in issue HDFS-720 (same 
test run but about 8 seconds between errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HDFS-721) ERROR Block blk_XXX_1030 already exists in state RBW and thus cannot be created

2009-10-21 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HDFS-721.


Resolution: Invalid

Working as designed.  Closing.

 ERROR Block blk_XXX_1030 already exists in state RBW and thus cannot be 
 created
 ---

 Key: HDFS-721
 URL: https://issues.apache.org/jira/browse/HDFS-721
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.21.0
 Environment: dfs.support.append=true
 Current branch-0.21 of hdfs, mapreduce, and common. Here is svn info:
 URL: https://svn.apache.org/repos/asf/hadoop/hdfs/branches/branch-0.21
 Repository Root: https://svn.apache.org/repos/asf
 Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
 Revision: 827883
 Node Kind: directory
 Schedule: normal
 Last Changed Author: szetszwo
 Last Changed Rev: 826906
 Last Changed Date: 2009-10-20 00:16:25 + (Tue, 20 Oct 2009)
Reporter: stack

 Running some loading tests against hdfs branch-0.21 I got the following:
 {code}
 2009-10-21 04:57:10,770 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
 Receiving block blk_6345892463926159834_1030 src: /XX.XX.XX.141:53112 dest: 
 /XX.XX.XX.140:51010
 2009-10-21 04:57:10,771 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
 writeBlock blk_6345892463926159834_1030 received exception 
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 blk_6345892463926159834_1030 already exists in state RBW and thus cannot be 
 created.
 2009-10-21 04:57:10,771 ERROR 
 org.apache.hadoop.hdfs.server.datanode.DataNode: 
 DatanodeRegistration(XX.XX.XX.140:51010, 
 storageID=DS-1292310101-XX.XX.XX.140-51010-1256100924816, infoPort=51075, 
 ipcPort=51020):DataXceiver
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 blk_6345892463926159834_1030 already exists in state RBW and thus cannot be 
 created.
 at 
 org.apache.hadoop.hdfs.server.datanode.FSDataset.createTemporary(FSDataset.java:1324)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:98)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:258)
 at 
 org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:382)
 at 
 org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:323)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:111)
 at java.lang.Thread.run(Thread.java:619)
 {code}
 On the sender side:
 {code}
 2009-10-21 04:57:10,740 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
 DatanodeRegistration(XX.XX.XX.141:51010, 
 storageID=DS-1870884070-XX.XX.XX.141-51010-1256100925196, infoPort=51075, 
 ipcPort=51020) Starting thread to transfer block blk_6345892463926159834_1030 
 to XX.XX.XX.140:51010
 2009-10-21 04:57:10,770 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 DatanodeRegistration(XX.XX.XX.141:51010, 
 storageID=DS-1870884070-XX.XX.XX.141-51010-1256100925196, infoPort=51075, 
 ipcPort=51020):Failed to transfer blk_6345892463926159834_1030 to 
 XX.XX.XX.140:51010 got java.net.SocketException: Original Exception : 
 java.io.IOException: Connection reset by peer
 at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
 at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:415)
 at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:516)
 at 
 org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:199)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:346)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:434)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1262)
 at java.lang.Thread.run(Thread.java:619)
 Caused by: java.io.IOException: Connection reset by peer
 ... 8 more
 {code}
 The block sequence number, 1030, is one more than that in issue HDFS-720 
 (same test run but about 8 seconds between errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [VOTE] port HDFS-127 (DFSClient block read failures cause open DFSInputStream to become unusable) to hadoop 0.20/0.21

2009-10-20 Thread stack

+1

On Mon, Oct 19, 2009 at 2:34 PM, Tsz Wo (Nicholas), Sze 
s29752-hadoop...@yahoo.com wrote:

 DFSClient has a retry mechanism on block acquiring for read.  If the number
 of retries attends to a certain limit (defined by
 dfs.client.max.block.acquire.failures), DFSClient will throw a
 BlockMissingException back to the user application.  In the current
 implementation, DFSClient counts the failures across multiple block
 acquiring operations but the block acquiring operations are supposed to be
 independent.  HDFS-127 fixes this problem by counting the failures within a
 single operation.

 I propose to commit HDFS-127 to 0.20 and above since this fix is safe and
 very useful.

 Nicholas Sze

[jira] Created: (HDFS-720) NPE in BlockReceiver$PacketResponder.run(BlockReceiver.java:923)

2009-10-20 Thread stack (JIRA)

NPE in BlockReceiver$PacketResponder.run(BlockReceiver.java:923)


 Key: HDFS-720
 URL: https://issues.apache.org/jira/browse/HDFS-720
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.21.0
 Environment: Current branch-0.21 of hdfs, mapreduce, and common.  Here 
is svn info:

URL: https://svn.apache.org/repos/asf/hadoop/hdfs/branches/branch-0.21
Repository Root: https://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 827883
Node Kind: directory
Schedule: normal
Last Changed Author: szetszwo
Last Changed Rev: 826906
Last Changed Date: 2009-10-20 00:16:25 + (Tue, 20 Oct 2009)
Reporter: stack


Running some loadings on hdfs I had one of these on the DN XX.XX.XX.139:51010:

{code}
2009-10-21 04:57:02,755 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving block blk_6345892463926159834_1029 src: /XX,XX,XX.140:37890 dest: 
/XX.XX.XX.139:51010
2009-10-21 04:57:02,829 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder blk_6345892463926159834_1029 1 Exception 
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:923)
at java.lang.Thread.run(Thread.java:619)
{code}

On XX,XX,XX.140 side, it looks like this:

{code}
10-21 04:57:01,866 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving block blk_6345892463926159834_1029 src: /XX.XX.XX.140:37385 dest: 
/XX.XX.XX140:51010
2009-10-21 04:57:02,836 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
PacketResponder 2 for block blk_6345892463926159834_1029 terminating
2009-10-21 04:57:02,885 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(XX.XX.XX.140:51010, 
storageID=DS-1292310101-208.76.44.140-51010-1256100924816, infoPort=51075, 
ipcPort=51020):Exception writing block blk_6345892463926159834_1029 to mirror 
XX.XX.XX.139:51010
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
at sun.nio.ch.IOUtil.write(IOUtil.java:75)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
at 
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:466)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:434)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:573)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.opWriteBlock(DataXceiver.java:352)
at 
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opWriteBlock(DataTransferProtocol.java:382)
at 
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:323)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:111)
at java.lang.Thread.run(Thread.java:619)
{code}

Here is the bit of code inside the run method:
{code}
 922   pkt = ackQueue.getFirst();
 923   expected = pkt.seqno;
{code}

So 'pkt' is null?  But LinkedList API says that it throws 
NoSuchElementException if list is empty so you'd think we wouldn't get a NPE 
here.  What am I missing?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

72 matches

Mail list logo