[jira] Commented: (HADOOP-2329) [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data
[ https://issues.apache.org/jira/browse/HADOOP-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548573 ] Edward Yoon commented on HADOOP-2329: - {code} row bikecar row1 bike:name Harley davidson ... bike:cc 800 ... bike:price 23,000 bike:price_currency U.S dollar ... {code} For this case, i'm think some different method for each Type-definition. [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data Key: HADOOP-2329 URL: https://issues.apache.org/jira/browse/HADOOP-2329 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Edward Yoon Assignee: Edward Yoon Fix For: 0.16.0 A built-in data type is a fundamental data type that the hbase shell defines. (character strings, scalars, ranges, arrays, ... , etc) If you need a specialized data type that is not currently provided as a built-in type, you are encouraged to write your own user-defined data type using UDC(not yet implemented). (or contribute it for distribution in a future release of hbase shell) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2354) Add job-level counters for the launched speculative tasks
Add job-level counters for the launched speculative tasks - Key: HADOOP-2354 URL: https://issues.apache.org/jira/browse/HADOOP-2354 Project: Hadoop Issue Type: Improvement Components: mapred Reporter: Arun C Murthy Assignee: Arun C Murthy Fix For: 0.16.0 Add job-level counters for the launched speculative tasks, this should help track them. Ideally we would also have counters to check how many of the speculative tasks completed before the original task (thereby helps validate the strategy for launching speculative tasks), however we do not have this infrastructure yet (HADOOP-544) - so I'll file a follow-on bug for that feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2338) [hbase] NPE in master server
[ https://issues.apache.org/jira/browse/HADOOP-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548575 ] Hadoop QA commented on HADOOP-2338: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12370997/patch.txt against trunk revision r601221. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1267/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1267/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1267/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1267/console This message is automatically generated. [hbase] NPE in master server Key: HADOOP-2338 URL: https://issues.apache.org/jira/browse/HADOOP-2338 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: master.log.gz, patch.txt Master gets an NPE after receiving multiple responses from the same server telling the master it has opened a region. {code} 2007-12-02 20:31:37,515 DEBUG hbase.HRegion - Next sequence id for region postlog,img254/577/02suecia024richardburnson0.jpg,1196619667879 is 73377537 2007-12-02 20:31:37,517 INFO hbase.HRegion - region postlog,img254/577/02suecia024richardburnson0.jpg,1196619667879 available 2007-12-02 20:31:39,200 WARN hbase.HRegionServer - Processing message (Retry: 0) java.io.IOException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hbase.HMaster.processMsgs(HMaster.java :1484) at org.apache.hadoop.hbase.HMaster.regionServerReport(HMaster.java:1423) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java :25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) at sun.reflect.NativeConstructorAccessorImpl.newInstance0 (Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java :27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82) at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException (RemoteExceptionHandler.java:48) at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:759) at java.lang.Thread.run(Thread.java:619) case HMsg.MSG_REPORT_PROCESS_OPEN: synchronized ( this.assignAttempts) { // Region server has acknowledged request to open region. // Extend region open time by 1/2 max region open time. **1484** assignAttempts.put(region.getRegionName (), Long.valueOf(assignAttempts.get( region.getRegionName()).longValue() + (this.maxRegionOpenTime / 2))); } break; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2349) FSEditLog.logEdit(byte op, Writable w1, Writable w2) should accept variable numbers of Writable, instead of two.
[ https://issues.apache.org/jira/browse/HADOOP-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548607 ] Hadoop QA commented on HADOOP-2349: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12370983/2349_20071204.patch against trunk revision r601232. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1268/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1268/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1268/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1268/console This message is automatically generated. FSEditLog.logEdit(byte op, Writable w1, Writable w2) should accept variable numbers of Writable, instead of two. Key: HADOOP-2349 URL: https://issues.apache.org/jira/browse/HADOOP-2349 Project: Hadoop Issue Type: Improvement Components: dfs Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Priority: Minor Attachments: 2349_20071204.patch The new declaration should be {code} FSEditLog.logEdit(byte op, Writable ... w) {code} All Writable parameters should not be null. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2339) [Hbase Shell] Delete command with no WHERE clause
[ https://issues.apache.org/jira/browse/HADOOP-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548640 ] Hadoop QA commented on HADOOP-2339: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12370926/2339_v04.patch against trunk revision r601232. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1270/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1270/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1270/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1270/console This message is automatically generated. [Hbase Shell] Delete command with no WHERE clause - Key: HADOOP-2339 URL: https://issues.apache.org/jira/browse/HADOOP-2339 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Edward Yoon Assignee: Edward Yoon Fix For: 0.16.0 Attachments: 2339.patch, 2339_v02.patch, 2339_v03.patch, 2339_v04.patch using HbaseAdmin.deleteColumn() method. {code} DELETE column_name FROM table_name; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2356) Set memcache flush size per table
Set memcache flush size per table - Key: HADOOP-2356 URL: https://issues.apache.org/jira/browse/HADOOP-2356 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Reporter: Paul Saab Priority: Minor The amount of memory taken by the memcache before a flush is currently a global parameter. It should be configurable per-table. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2357) [hbase] Compaction cleanup; less deleting + prevent possible file leaks
[hbase] Compaction cleanup; less deleting + prevent possible file leaks --- Key: HADOOP-2357 URL: https://issues.apache.org/jira/browse/HADOOP-2357 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: stack Priority: Minor Fix For: 0.16.0 This issue is being created so I can commit the compaction patch that just passed hudson over in HADOOP-2283. That issue is about trouble accessing hdfs. It should stay open since haven't yet figured whats up. As a by-product of the investigation, the compaction patch was generated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2338) [hbase] NPE in master server
[ https://issues.apache.org/jira/browse/HADOOP-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548737 ] Jim Kellerman commented on HADOOP-2338: --- If regionserver fails to record split in META, there is no other means of master finding daughter regions. What happens on restart? We pick up the parent again? The daughter regions will be by-passed? Yes, there is a real potential for corrupting hbase in this case. Hence the quiesce shut down mechanism proposed above [hbase] NPE in master server Key: HADOOP-2338 URL: https://issues.apache.org/jira/browse/HADOOP-2338 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: master.log.gz, patch.txt Master gets an NPE after receiving multiple responses from the same server telling the master it has opened a region. {code} 2007-12-02 20:31:37,515 DEBUG hbase.HRegion - Next sequence id for region postlog,img254/577/02suecia024richardburnson0.jpg,1196619667879 is 73377537 2007-12-02 20:31:37,517 INFO hbase.HRegion - region postlog,img254/577/02suecia024richardburnson0.jpg,1196619667879 available 2007-12-02 20:31:39,200 WARN hbase.HRegionServer - Processing message (Retry: 0) java.io.IOException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hbase.HMaster.processMsgs(HMaster.java :1484) at org.apache.hadoop.hbase.HMaster.regionServerReport(HMaster.java:1423) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java :25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) at sun.reflect.NativeConstructorAccessorImpl.newInstance0 (Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java :27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82) at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException (RemoteExceptionHandler.java:48) at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:759) at java.lang.Thread.run(Thread.java:619) case HMsg.MSG_REPORT_PROCESS_OPEN: synchronized ( this.assignAttempts) { // Region server has acknowledged request to open region. // Extend region open time by 1/2 max region open time. **1484** assignAttempts.put(region.getRegionName (), Long.valueOf(assignAttempts.get( region.getRegionName()).longValue() + (this.maxRegionOpenTime / 2))); } break; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2338) [hbase] NPE in master server
[ https://issues.apache.org/jira/browse/HADOOP-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548732 ] stack commented on HADOOP-2338: --- If regionserver fails to record split in META, there is no other means of master finding daughter regions. What happens on restart? We pick up the parent again? The daughter regions will be by-passed? [hbase] NPE in master server Key: HADOOP-2338 URL: https://issues.apache.org/jira/browse/HADOOP-2338 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: master.log.gz, patch.txt Master gets an NPE after receiving multiple responses from the same server telling the master it has opened a region. {code} 2007-12-02 20:31:37,515 DEBUG hbase.HRegion - Next sequence id for region postlog,img254/577/02suecia024richardburnson0.jpg,1196619667879 is 73377537 2007-12-02 20:31:37,517 INFO hbase.HRegion - region postlog,img254/577/02suecia024richardburnson0.jpg,1196619667879 available 2007-12-02 20:31:39,200 WARN hbase.HRegionServer - Processing message (Retry: 0) java.io.IOException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hbase.HMaster.processMsgs(HMaster.java :1484) at org.apache.hadoop.hbase.HMaster.regionServerReport(HMaster.java:1423) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java :25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) at sun.reflect.NativeConstructorAccessorImpl.newInstance0 (Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java :27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82) at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException (RemoteExceptionHandler.java:48) at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:759) at java.lang.Thread.run(Thread.java:619) case HMsg.MSG_REPORT_PROCESS_OPEN: synchronized ( this.assignAttempts) { // Region server has acknowledged request to open region. // Extend region open time by 1/2 max region open time. **1484** assignAttempts.put(region.getRegionName (), Long.valueOf(assignAttempts.get( region.getRegionName()).longValue() + (this.maxRegionOpenTime / 2))); } break; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2348) [hbase] lock_id in HTable.startUpdate and commit/abort is misleading and useless
[ https://issues.apache.org/jira/browse/HADOOP-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Duxbury updated HADOOP-2348: -- Component/s: contrib/hbase [hbase] lock_id in HTable.startUpdate and commit/abort is misleading and useless Key: HADOOP-2348 URL: https://issues.apache.org/jira/browse/HADOOP-2348 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: Bryan Duxbury Assignee: Jim Kellerman Priority: Minor In the past, the lock id returned by HTable.startUpdate was a real lock id from a remote server. However, that has been superceeded by the BatchUpdate process, so now the lock id is just an arbitrary value. More, it doesn't actually add any value, because while it implies that you could start two updates on the same HTable and commit them separately, this is in fact not the case. Any attempt to do a second startUpdate throws an IllegalStateException. Since there is no added functionality afforded by the presence of this parameter, I suggest that we overload all methods that use it to ignore it and print a deprecation notice. startUpdate can just return a constant like 1 and eventually turn into a boolean or some other useful value. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2338) [hbase] NPE in master server
[ https://issues.apache.org/jira/browse/HADOOP-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548736 ] Jim Kellerman commented on HADOOP-2338: --- Here's a rough outline of how cluster shutdown should work.: - master receives shutdown request - as each region server reports in, the master instructs the region server to 'quiesce'. This means that the region server should stop accepting requests for user regions, and close them. If it has no meta regions, it reports back to the master that it is exiting. Otherwise it reports that it is quiesced. - once there are only quiesced region servers running, the master instructs them to shut down. They close the meta regions and tell the master that they have exited. - when there are no more active region servers, the master can then shut down. [hbase] NPE in master server Key: HADOOP-2338 URL: https://issues.apache.org/jira/browse/HADOOP-2338 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Assignee: Jim Kellerman Fix For: 0.16.0 Attachments: master.log.gz, patch.txt Master gets an NPE after receiving multiple responses from the same server telling the master it has opened a region. {code} 2007-12-02 20:31:37,515 DEBUG hbase.HRegion - Next sequence id for region postlog,img254/577/02suecia024richardburnson0.jpg,1196619667879 is 73377537 2007-12-02 20:31:37,517 INFO hbase.HRegion - region postlog,img254/577/02suecia024richardburnson0.jpg,1196619667879 available 2007-12-02 20:31:39,200 WARN hbase.HRegionServer - Processing message (Retry: 0) java.io.IOException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hbase.HMaster.processMsgs(HMaster.java :1484) at org.apache.hadoop.hbase.HMaster.regionServerReport(HMaster.java:1423) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java :25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) at sun.reflect.NativeConstructorAccessorImpl.newInstance0 (Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java :27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82) at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException (RemoteExceptionHandler.java:48) at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:759) at java.lang.Thread.run(Thread.java:619) case HMsg.MSG_REPORT_PROCESS_OPEN: synchronized ( this.assignAttempts) { // Region server has acknowledged request to open region. // Extend region open time by 1/2 max region open time. **1484** assignAttempts.put(region.getRegionName (), Long.valueOf(assignAttempts.get( region.getRegionName()).longValue() + (this.maxRegionOpenTime / 2))); } break; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2342) create a micro-benchmark for measure local-file versus hdfs read
[ https://issues.apache.org/jira/browse/HADOOP-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-2342: -- Attachment: throughput.patch This benchmark reads and writes files using java.io, RawLocalFileSystem, LocalFileSystem, and HDFS and reports the time. create a micro-benchmark for measure local-file versus hdfs read Key: HADOOP-2342 URL: https://issues.apache.org/jira/browse/HADOOP-2342 Project: Hadoop Issue Type: Test Components: dfs Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.16.0 Attachments: throughput.patch We should have a benchmark that measures reading a 10g file from hdfs and from local disk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1841) IPC server should write repsonses asynchronously
[ https://issues.apache.org/jira/browse/HADOOP-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HADOOP-1841: - Status: Open (was: Patch Available) Findbugs warnings. IPC server should write repsonses asynchronously Key: HADOOP-1841 URL: https://issues.apache.org/jira/browse/HADOOP-1841 Project: Hadoop Issue Type: Improvement Components: ipc Reporter: Doug Cutting Assignee: dhruba borthakur Fix For: 0.16.0 Attachments: asyncRPC-2.patch, asyncRPC-4.patch, asyncRPC-5.patch, asyncRPC-6.patch, asyncRPC.patch, asyncRPC.patch Hadoop's IPC Server currently writes responses from request handler threads using blocking writes. Performance and scalability might be improved if responses were written asynchronously. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2359) PendingReplicationMonitor thread received exception. java.lang.InterruptedException
PendingReplicationMonitor thread received exception. java.lang.InterruptedException --- Key: HADOOP-2359 URL: https://issues.apache.org/jira/browse/HADOOP-2359 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.16.0 Reporter: Owen O'Malley Assignee: dhruba borthakur Fix For: 0.16.0 I sometimes get the message: 07/12/05 19:01:36 WARN fs.FSNamesystem: PendingReplicationMonitor thread received exception. java.lang.InterruptedException: sleep interrupted from mini-dfs cluster. InterruptedExceptions should be handled quietly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2012) Periodic verification at the Datanode
[ https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548765 ] Raghu Angadi commented on HADOOP-2012: -- For Windows, will make work most of the time.. very rarely some updates to last verification might fail and thats ok. I will see how much this will take. Another option of course is to fix properly and make it work equally work every where. Periodic verification at the Datanode - Key: HADOOP-2012 URL: https://issues.apache.org/jira/browse/HADOOP-2012 Project: Hadoop Issue Type: New Feature Components: dfs Reporter: Raghu Angadi Assignee: Raghu Angadi Fix For: 0.16.0 Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch Currently on-disk data corruption on data blocks is detected only when it is read by the client or by another datanode. These errors are detected much earlier if datanode can periodically verify the data checksums for the local blocks. Some of the issues to consider : - How should we check the blocks ( no more often than once every couple of weeks ?) - How do we keep track of when a block was last verfied ( there is a .meta file associcated with each lock ). - What action to take once a corruption is detected - Scanning should be done as a very low priority with rest of the datanode disk traffic in mind. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2356) Set memcache flush size per table
[ https://issues.apache.org/jira/browse/HADOOP-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548731 ] Jim Kellerman commented on HADOOP-2356: --- Actually, since there is one memcache per column, this could be set on a per column basis Set memcache flush size per table - Key: HADOOP-2356 URL: https://issues.apache.org/jira/browse/HADOOP-2356 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Reporter: Paul Saab Priority: Minor The amount of memory taken by the memcache before a flush is currently a global parameter. It should be configurable per-table. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2342) create a micro-benchmark for measure local-file versus hdfs read
[ https://issues.apache.org/jira/browse/HADOOP-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-2342: -- Status: Patch Available (was: Open) create a micro-benchmark for measure local-file versus hdfs read Key: HADOOP-2342 URL: https://issues.apache.org/jira/browse/HADOOP-2342 Project: Hadoop Issue Type: Test Components: dfs Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.16.0 Attachments: throughput.patch We should have a benchmark that measures reading a 10g file from hdfs and from local disk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1841) IPC server should write repsonses asynchronously
[ https://issues.apache.org/jira/browse/HADOOP-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HADOOP-1841: - Attachment: asyncRPC-6.patch Fixed findbugs warnings. IPC server should write repsonses asynchronously Key: HADOOP-1841 URL: https://issues.apache.org/jira/browse/HADOOP-1841 Project: Hadoop Issue Type: Improvement Components: ipc Reporter: Doug Cutting Assignee: dhruba borthakur Fix For: 0.16.0 Attachments: asyncRPC-2.patch, asyncRPC-4.patch, asyncRPC-5.patch, asyncRPC-6.patch, asyncRPC.patch, asyncRPC.patch Hadoop's IPC Server currently writes responses from request handler threads using blocking writes. Performance and scalability might be improved if responses were written asynchronously. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2012) Periodic verification at the Datanode
[ https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548756 ] eric baldeschwieler commented on HADOOP-2012: - I'd really rather see us get this right for supported platforms rather than declare an issue done when we know it does not work on some platforms. This is particularly vexing when design choices could clearly be made that would avoid these issues. -1 Periodic verification at the Datanode - Key: HADOOP-2012 URL: https://issues.apache.org/jira/browse/HADOOP-2012 Project: Hadoop Issue Type: New Feature Components: dfs Reporter: Raghu Angadi Assignee: Raghu Angadi Fix For: 0.16.0 Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch Currently on-disk data corruption on data blocks is detected only when it is read by the client or by another datanode. These errors are detected much earlier if datanode can periodically verify the data checksums for the local blocks. Some of the issues to consider : - How should we check the blocks ( no more often than once every couple of weeks ?) - How do we keep track of when a block was last verfied ( there is a .meta file associcated with each lock ). - What action to take once a corruption is detected - Scanning should be done as a very low priority with rest of the datanode disk traffic in mind. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-4) tool to mount dfs on linux
[ https://issues.apache.org/jira/browse/HADOOP-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548789 ] Anurag Sharma commented on HADOOP-4: hi Doug, Thanks for pointing out this issue. I will remove the FUSE-J patch and try one of the other routes you suggested (to have a patched FUSE-J available), and will come back with a resolution on this very soon. -anurag tool to mount dfs on linux -- Key: HADOOP-4 URL: https://issues.apache.org/jira/browse/HADOOP-4 Project: Hadoop Issue Type: Improvement Components: fs Affects Versions: 0.5.0 Environment: linux only Reporter: John Xing Assignee: Doug Cutting Attachments: fuse-hadoop-0.1.0_fuse-j.2.2.3_hadoop.0.5.0.tar.gz, fuse-hadoop-0.1.0_fuse-j.2.4_hadoop.0.5.0.tar.gz, fuse-hadoop-0.1.1.tar.gz, fuse-j-hadoopfs-0.1.zip, fuse-j-patch.zip tool to mount dfs on linux -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2355) Set region split size on table creation
[ https://issues.apache.org/jira/browse/HADOOP-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548733 ] Jim Kellerman commented on HADOOP-2355: --- The finest level of granularity for this parameter would be at the table level since a region split affects all the columns in a particular row range Set region split size on table creation --- Key: HADOOP-2355 URL: https://issues.apache.org/jira/browse/HADOOP-2355 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Reporter: Paul Saab Priority: Minor Right now the region size before a split is determined by a global configuration. It would be nice to configure tables independently of the global parameter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1652) Rebalance data blocks when new data nodes added or data nodes become full
[ https://issues.apache.org/jira/browse/HADOOP-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HADOOP-1652: - Resolution: Fixed Status: Resolved (was: Patch Available) I just committed this. Thanks Hairong! Rebalance data blocks when new data nodes added or data nodes become full - Key: HADOOP-1652 URL: https://issues.apache.org/jira/browse/HADOOP-1652 Project: Hadoop Issue Type: New Feature Components: dfs Affects Versions: 0.13.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.16.0 Attachments: balancer.patch, balancer1.patch, balancer2.patch, balancer3.patch, balancer4.patch, balancer5.patch, balancer6.patch, balancer7.patch, balancer8.patch, BalancerAdminGuide.pdf, BalancerAdminGuide1.pdf, BalancerUserGuide2.pdf, RebalanceDesign4.pdf, RebalanceDesign5.pdf, RebalanceDesign6.pdf When a new data node joins hdfs cluster, it does not hold much data. So any map task assigned to the machine most likely does not read local data, thus increasing the use of network bandwidth. On the other hand, when some data nodes become full, new data blocks are placed on only non-full data nodes, thus reducing their read parallelism. This jira aims to find an approach to redistribute data blocks when imbalance occurs in the cluster. An solution should meet the following requirements: 1. It maintains data availablility guranteens in the sense that rebalancing does not reduce the number of replicas that a block has or the number of racks that the block resides. 2. An adminstrator should be able to invoke and interrupt rebalancing from a command line. 3. Rebalancing should be throttled so that rebalancing does not cause a namenode to be too busy to serve any incoming request or saturate the network. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2160) separate website from user documentation
[ https://issues.apache.org/jira/browse/HADOOP-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Cutting updated HADOOP-2160: - Attachment: trunk.patch One can now check out http://svn.apache.org/repos/asf/lucene/hadoop/site/publish to preview the new top-level site. This is a patch for trunk, removing all of the project-level documentation, so that all that remains is end-user documentation, suitable for distribution with a release, or for linking to from the site. I've also added a docs target to the top-level build.xml that runs forrest. Unless there are objections, I'll commit this soon and update the website. separate website from user documentation Key: HADOOP-2160 URL: https://issues.apache.org/jira/browse/HADOOP-2160 Project: Hadoop Issue Type: Improvement Reporter: Doug Cutting Assignee: Doug Cutting Attachments: trunk.patch Currently the website only contains the documentation for a single release, the current release. It would be better if the website also contained documentation for past releases, since not everyone is using the current release. To implement this we should move the top-level of the website, including project and developer information, from the subversion trunk into a separate tree, so that only the user documentation is branched per release. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-4) tool to mount dfs on linux
[ https://issues.apache.org/jira/browse/HADOOP-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548807 ] Doug Cutting commented on HADOOP-4: --- I went through the license for Fuse-J and it is distributed under LGPL, Unfortunately, the ASF cannot host things published under LGPL either. Sorry! tool to mount dfs on linux -- Key: HADOOP-4 URL: https://issues.apache.org/jira/browse/HADOOP-4 Project: Hadoop Issue Type: Improvement Components: fs Affects Versions: 0.5.0 Environment: linux only Reporter: John Xing Assignee: Doug Cutting Attachments: fuse-hadoop-0.1.0_fuse-j.2.2.3_hadoop.0.5.0.tar.gz, fuse-hadoop-0.1.0_fuse-j.2.4_hadoop.0.5.0.tar.gz, fuse-hadoop-0.1.1.tar.gz, fuse-j-hadoopfs-0.1.zip, fuse-j-patch.zip tool to mount dfs on linux -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-4) tool to mount dfs on linux
[ https://issues.apache.org/jira/browse/HADOOP-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548806 ] Anurag Sharma commented on HADOOP-4: Hi Doug, I went through the license for Fuse-J and it is distributed under LGPL, do you think that would allow the Fuse-J patches to be hosted on Apache? (In the latter case we would still modify the submission above to be a contrib module that downloads Fuse-J, applies our patch, and builds it, except we won't have to find a place to host the patch). -thanks -anurag tool to mount dfs on linux -- Key: HADOOP-4 URL: https://issues.apache.org/jira/browse/HADOOP-4 Project: Hadoop Issue Type: Improvement Components: fs Affects Versions: 0.5.0 Environment: linux only Reporter: John Xing Assignee: Doug Cutting Attachments: fuse-hadoop-0.1.0_fuse-j.2.2.3_hadoop.0.5.0.tar.gz, fuse-hadoop-0.1.0_fuse-j.2.4_hadoop.0.5.0.tar.gz, fuse-hadoop-0.1.1.tar.gz, fuse-j-hadoopfs-0.1.zip, fuse-j-patch.zip tool to mount dfs on linux -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2359) PendingReplicationMonitor thread received exception. java.lang.InterruptedException
[ https://issues.apache.org/jira/browse/HADOOP-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HADOOP-2359: - Attachment: replicationWarning.patch Changed warning message to debug. PendingReplicationMonitor thread received exception. java.lang.InterruptedException --- Key: HADOOP-2359 URL: https://issues.apache.org/jira/browse/HADOOP-2359 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.16.0 Reporter: Owen O'Malley Assignee: dhruba borthakur Fix For: 0.16.0 Attachments: replicationWarning.patch I sometimes get the message: 07/12/05 19:01:36 WARN fs.FSNamesystem: PendingReplicationMonitor thread received exception. java.lang.InterruptedException: sleep interrupted from mini-dfs cluster. InterruptedExceptions should be handled quietly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2359) PendingReplicationMonitor thread received exception. java.lang.InterruptedException
[ https://issues.apache.org/jira/browse/HADOOP-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HADOOP-2359: - Status: Patch Available (was: Open) PendingReplicationMonitor thread received exception. java.lang.InterruptedException --- Key: HADOOP-2359 URL: https://issues.apache.org/jira/browse/HADOOP-2359 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.16.0 Reporter: Owen O'Malley Assignee: dhruba borthakur Fix For: 0.16.0 Attachments: replicationWarning.patch I sometimes get the message: 07/12/05 19:01:36 WARN fs.FSNamesystem: PendingReplicationMonitor thread received exception. java.lang.InterruptedException: sleep interrupted from mini-dfs cluster. InterruptedExceptions should be handled quietly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-496) Expose HDFS as a WebDAV store
[ https://issues.apache.org/jira/browse/HADOOP-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anurag Sharma updated HADOOP-496: - Attachment: (was: fuse-j-hadoopfs-0.zip) Expose HDFS as a WebDAV store - Key: HADOOP-496 URL: https://issues.apache.org/jira/browse/HADOOP-496 Project: Hadoop Issue Type: New Feature Components: dfs Reporter: Michel Tourn Assignee: Enis Soztutar Attachments: hadoop-496-3.patch, hadoop-496-4.patch, hadoop-496-spool-cleanup.patch, hadoop-webdav.zip, jetty-slide.xml, lib.webdav.tar.gz, screenshot-1.jpg, slideusers.properties, webdav_wip1.patch, webdav_wip2.patch WebDAV stands for Distributed Authoring and Versioning. It is a set of extensions to the HTTP protocol that lets users collaboratively edit and manage files on a remote web server. It is often considered as a replacement for NFS or SAMBA HDFS (Hadoop Distributed File System) needs a friendly file system interface. DFSShell commands are unfamiliar. Instead it is more convenient for Hadoop users to use a mountable network drive. A friendly interface to HDFS will be used both for casual browsing of data and for bulk import/export. The FUSE provider for HDFS is already available ( http://issues.apache.org/jira/browse/HADOOP-17 ) but it had scalability problems. WebDAV is a popular alternative. The typical licensing terms for WebDAV tools are also attractive: GPL for Linux client tools that Hadoop would not redistribute anyway. More importantly, Apache Project/Apache license for Java tools and for server components. This allows for a tighter integration with the HDFS code base. There are some interesting Apache projects that support WebDAV. But these are probably too heavyweight for the needs of Hadoop: Tomcat servlet: http://tomcat.apache.org/tomcat-4.1-doc/catalina/docs/api/org/apache/catalina/servlets/WebdavServlet.html Slide: http://jakarta.apache.org/slide/ Being HTTP-based and backwards-compatible with Web Browser clients, the WebDAV server protocol could even be piggy-backed on the existing Web UI ports of the Hadoop name node / data nodes. WebDAV can be hosted as (Jetty) servlets. This minimizes server code bloat and this avoids additional network traffic between HDFS and the WebDAV server. General Clients (read-only): Any web browser Linux Clients: Mountable GPL davfs2 http://dav.sourceforge.net/ FTP-like GPL Cadaver http://www.webdav.org/cadaver/ Server Protocol compliance tests: http://www.webdav.org/neon/litmus/ A goal is for Hadoop HDFS to pass this test (minus support for Properties) Pure Java clients: DAV Explorer Apache lic. http://www.ics.uci.edu/~webdav/ WebDAV also makes it convenient to add advanced features in an incremental fashion: file locking, access control lists, hard links, symbolic links. New WebDAV standards get accepted and more or less featured WebDAV clients exist. core http://www.webdav.org/specs/rfc2518.html ACLs http://www.webdav.org/specs/rfc3744.html redirects soft links http://greenbytes.de/tech/webdav/rfc4437.html BIND hard links http://www.webdav.org/bind/ quota http://tools.ietf.org/html/rfc4331 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2160) separate website from user documentation
[ https://issues.apache.org/jira/browse/HADOOP-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548734 ] Doug Cutting commented on HADOOP-2160: -- FYI, I will make some commits on this issue without first submitting patches, since it involves a lot of subversion commands that are not amenable to patches. separate website from user documentation Key: HADOOP-2160 URL: https://issues.apache.org/jira/browse/HADOOP-2160 Project: Hadoop Issue Type: Improvement Reporter: Doug Cutting Assignee: Doug Cutting Currently the website only contains the documentation for a single release, the current release. It would be better if the website also contained documentation for past releases, since not everyone is using the current release. To implement this we should move the top-level of the website, including project and developer information, from the subversion trunk into a separate tree, so that only the user documentation is branched per release. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2012) Periodic verification at the Datanode
[ https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548797 ] Konstantin Shvachko commented on HADOOP-2012: - The question here is whether we would go with our current decision if we new it will not be supported on windows? If we let the balancer write a log type data (verified block #s) into a special file balancer.log instead of modifying meta-data files, will that be a problem? Looks like Eric already had a proposal of scanning blocks in a predetermined order. Should we reconsider this? Periodic verification at the Datanode - Key: HADOOP-2012 URL: https://issues.apache.org/jira/browse/HADOOP-2012 Project: Hadoop Issue Type: New Feature Components: dfs Reporter: Raghu Angadi Assignee: Raghu Angadi Fix For: 0.16.0 Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch Currently on-disk data corruption on data blocks is detected only when it is read by the client or by another datanode. These errors are detected much earlier if datanode can periodically verify the data checksums for the local blocks. Some of the issues to consider : - How should we check the blocks ( no more often than once every couple of weeks ?) - How do we keep track of when a block was last verfied ( there is a .meta file associcated with each lock ). - What action to take once a corruption is detected - Scanning should be done as a very low priority with rest of the datanode disk traffic in mind. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2360) hadoop::RecordReader::read() throws exception in HadoopPipes::RecordWriter
hadoop::RecordReader::read() throws exception in HadoopPipes::RecordWriter -- Key: HADOOP-2360 URL: https://issues.apache.org/jira/browse/HADOOP-2360 Project: Hadoop Issue Type: Bug Affects Versions: 0.14.3 Reporter: Yiping Han Priority: Blocker The jute record is in format: class SampleValue { ustring data; } And in HadoopPipes::RecordWriter::emit(), has code like this: void SampleRecordWriterC::emit(const std::string key, const std::string value) { if (key.empty() || value.empty()) { return; } hadoop::StringInStream key_in_stream(const_caststd::string(key)); hadoop::RecordReader key_record_reader(key_in_stream, hadoop::kCSV); EmitKeyT emit_key; key_record_reader.read(emit_key); hadoop::StringInStream value_in_stream(const_caststd::string(value)); hadoop::RecordReader value_record_reader(value_in_stream, hadoop::kCSV); EmitValueT emit_value; value_record_reader.read(emit_value); return; } And the code throw hadoop::IOException at the read() line. In the mapper, I have faked record emitted by the following code: std::string value; EmitValueT emit_value; emit_value.getData().assign(FakeData); hadoop::StringOutStream value_out_stream(value); hadoop::RecordWriter value_record_writer(value_out_stream, hadoop::kCSV); value_record_writer.write(emit_value); We haven't update to the up-to-date version of hadoop. But I've searched the tickets and didn't find one issuing this problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1298) adding user info to file
[ https://issues.apache.org/jira/browse/HADOOP-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HADOOP-1298: --- Attachment: (was: 20071116b.patch) adding user info to file Key: HADOOP-1298 URL: https://issues.apache.org/jira/browse/HADOOP-1298 Project: Hadoop Issue Type: New Feature Components: dfs, fs Reporter: Kurtis Heimerl Assignee: Christophe Taton Attachments: 1298_2007-09-22_1.patch, 1298_2007-10-04_1.patch, 1298_20071205.patch, hadoop-user-munncha.patch17 I'm working on adding a permissions model to hadoop's DFS. The first step is this change, which associates user info with files. Following this I'll assoicate permissions info, then block methods based on that user info, then authorization of the user info. So, right now i've implemented adding user info to files. I'm looking for feedback before I clean this up and make it offical. I wasn't sure what release, i'm working off trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1841) IPC server should write repsonses asynchronously
[ https://issues.apache.org/jira/browse/HADOOP-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548689 ] Hadoop QA commented on HADOOP-1841: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12371003/asyncRPC-5.patch against trunk revision r601232. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs -1. The patch appears to introduce 2 new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1272/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1272/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1272/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1272/console This message is automatically generated. IPC server should write repsonses asynchronously Key: HADOOP-1841 URL: https://issues.apache.org/jira/browse/HADOOP-1841 Project: Hadoop Issue Type: Improvement Components: ipc Reporter: Doug Cutting Assignee: dhruba borthakur Fix For: 0.16.0 Attachments: asyncRPC-2.patch, asyncRPC-4.patch, asyncRPC-5.patch, asyncRPC.patch, asyncRPC.patch Hadoop's IPC Server currently writes responses from request handler threads using blocking writes. Performance and scalability might be improved if responses were written asynchronously. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2012) Periodic verification at the Datanode
[ https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548824 ] Raghu Angadi commented on HADOOP-2012: -- The question here is whether we would go with our current decision if we new it will not be supported on windows? A related concern is whether this is required for appends anyway. Periodic verification at the Datanode - Key: HADOOP-2012 URL: https://issues.apache.org/jira/browse/HADOOP-2012 Project: Hadoop Issue Type: New Feature Components: dfs Reporter: Raghu Angadi Assignee: Raghu Angadi Fix For: 0.16.0 Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch Currently on-disk data corruption on data blocks is detected only when it is read by the client or by another datanode. These errors are detected much earlier if datanode can periodically verify the data checksums for the local blocks. Some of the issues to consider : - How should we check the blocks ( no more often than once every couple of weeks ?) - How do we keep track of when a block was last verfied ( there is a .meta file associcated with each lock ). - What action to take once a corruption is detected - Scanning should be done as a very low priority with rest of the datanode disk traffic in mind. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1841) IPC server should write repsonses asynchronously
[ https://issues.apache.org/jira/browse/HADOOP-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548799 ] Hadoop QA commented on HADOOP-1841: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12371062/asyncRPC-6.patch against trunk revision r601383. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1273/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1273/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1273/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1273/console This message is automatically generated. IPC server should write repsonses asynchronously Key: HADOOP-1841 URL: https://issues.apache.org/jira/browse/HADOOP-1841 Project: Hadoop Issue Type: Improvement Components: ipc Reporter: Doug Cutting Assignee: dhruba borthakur Fix For: 0.16.0 Attachments: asyncRPC-2.patch, asyncRPC-4.patch, asyncRPC-5.patch, asyncRPC-6.patch, asyncRPC.patch, asyncRPC.patch Hadoop's IPC Server currently writes responses from request handler threads using blocking writes. Performance and scalability might be improved if responses were written asynchronously. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1652) Rebalance data blocks when new data nodes added or data nodes become full
[ https://issues.apache.org/jira/browse/HADOOP-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HADOOP-1652: -- Attachment: balancer8.patch The patch has a minor change to make the junit test to run faster. Rebalance data blocks when new data nodes added or data nodes become full - Key: HADOOP-1652 URL: https://issues.apache.org/jira/browse/HADOOP-1652 Project: Hadoop Issue Type: New Feature Components: dfs Affects Versions: 0.13.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.16.0 Attachments: balancer.patch, balancer1.patch, balancer2.patch, balancer3.patch, balancer4.patch, balancer5.patch, balancer6.patch, balancer7.patch, balancer8.patch, BalancerAdminGuide.pdf, BalancerAdminGuide1.pdf, BalancerUserGuide2.pdf, RebalanceDesign4.pdf, RebalanceDesign5.pdf, RebalanceDesign6.pdf When a new data node joins hdfs cluster, it does not hold much data. So any map task assigned to the machine most likely does not read local data, thus increasing the use of network bandwidth. On the other hand, when some data nodes become full, new data blocks are placed on only non-full data nodes, thus reducing their read parallelism. This jira aims to find an approach to redistribute data blocks when imbalance occurs in the cluster. An solution should meet the following requirements: 1. It maintains data availablility guranteens in the sense that rebalancing does not reduce the number of replicas that a block has or the number of racks that the block resides. 2. An adminstrator should be able to invoke and interrupt rebalancing from a command line. 3. Rebalancing should be throttled so that rebalancing does not cause a namenode to be too busy to serve any incoming request or saturate the network. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-2357) [hbase] Compaction cleanup; less deleting + prevent possible file leaks
[ https://issues.apache.org/jira/browse/HADOOP-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HADOOP-2357. --- Resolution: Fixed Resolving. Committed the compaction.patch from over in HADOOP-2283. [hbase] Compaction cleanup; less deleting + prevent possible file leaks --- Key: HADOOP-2357 URL: https://issues.apache.org/jira/browse/HADOOP-2357 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: stack Priority: Minor Fix For: 0.16.0 This issue is being created so I can commit the compaction patch that just passed hudson over in HADOOP-2283. That issue is about trouble accessing hdfs. It should stay open since haven't yet figured whats up. As a by-product of the investigation, the compaction patch was generated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-1327) Doc on Streaming
[ https://issues.apache.org/jira/browse/HADOOP-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley reassigned HADOOP-1327: - Assignee: Rob Weltman Doc on Streaming Key: HADOOP-1327 URL: https://issues.apache.org/jira/browse/HADOOP-1327 Project: Hadoop Issue Type: Improvement Components: documentation Reporter: Runping Qi Assignee: Rob Weltman Attachments: HADOOP-1327.patch, site.xml, streaming.html, streaming.xml -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2012) Periodic verification at the Datanode
[ https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548828 ] dhruba borthakur commented on HADOOP-2012: -- For appends, a reader can be reading the datafile and metafile while the writer is still writing to them. This is supported on Linux as well as windows. Periodic verification at the Datanode - Key: HADOOP-2012 URL: https://issues.apache.org/jira/browse/HADOOP-2012 Project: Hadoop Issue Type: New Feature Components: dfs Reporter: Raghu Angadi Assignee: Raghu Angadi Fix For: 0.16.0 Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch Currently on-disk data corruption on data blocks is detected only when it is read by the client or by another datanode. These errors are detected much earlier if datanode can periodically verify the data checksums for the local blocks. Some of the issues to consider : - How should we check the blocks ( no more often than once every couple of weeks ?) - How do we keep track of when a block was last verfied ( there is a .meta file associcated with each lock ). - What action to take once a corruption is detected - Scanning should be done as a very low priority with rest of the datanode disk traffic in mind. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2342) create a micro-benchmark for measure local-file versus hdfs read
[ https://issues.apache.org/jira/browse/HADOOP-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548820 ] Hadoop QA commented on HADOOP-2342: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12371063/throughput.patch against trunk revision r601491. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1274/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1274/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1274/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1274/console This message is automatically generated. create a micro-benchmark for measure local-file versus hdfs read Key: HADOOP-2342 URL: https://issues.apache.org/jira/browse/HADOOP-2342 Project: Hadoop Issue Type: Test Components: dfs Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.16.0 Attachments: throughput.patch We should have a benchmark that measures reading a 10g file from hdfs and from local disk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2012) Periodic verification at the Datanode
[ https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548832 ] Raghu Angadi commented on HADOOP-2012: -- This is supported on Linux as well as windows. Can we use that code here? I guessing it handles upgraded directories also... Periodic verification at the Datanode - Key: HADOOP-2012 URL: https://issues.apache.org/jira/browse/HADOOP-2012 Project: Hadoop Issue Type: New Feature Components: dfs Reporter: Raghu Angadi Assignee: Raghu Angadi Fix For: 0.16.0 Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch Currently on-disk data corruption on data blocks is detected only when it is read by the client or by another datanode. These errors are detected much earlier if datanode can periodically verify the data checksums for the local blocks. Some of the issues to consider : - How should we check the blocks ( no more often than once every couple of weeks ?) - How do we keep track of when a block was last verfied ( there is a .meta file associcated with each lock ). - What action to take once a corruption is detected - Scanning should be done as a very low priority with rest of the datanode disk traffic in mind. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2342) create a micro-benchmark for measure local-file versus hdfs read
[ https://issues.apache.org/jira/browse/HADOOP-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-2342: -- Status: Open (was: Patch Available) create a micro-benchmark for measure local-file versus hdfs read Key: HADOOP-2342 URL: https://issues.apache.org/jira/browse/HADOOP-2342 Project: Hadoop Issue Type: Test Components: dfs Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.16.0 Attachments: throughput.patch We should have a benchmark that measures reading a 10g file from hdfs and from local disk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2342) create a micro-benchmark for measure local-file versus hdfs read
[ https://issues.apache.org/jira/browse/HADOOP-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-2342: -- Status: Patch Available (was: Open) Need to be re-reviewed by QA. create a micro-benchmark for measure local-file versus hdfs read Key: HADOOP-2342 URL: https://issues.apache.org/jira/browse/HADOOP-2342 Project: Hadoop Issue Type: Test Components: dfs Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.16.0 Attachments: throughput.patch We should have a benchmark that measures reading a 10g file from hdfs and from local disk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2342) create a micro-benchmark for measure local-file versus hdfs read
[ https://issues.apache.org/jira/browse/HADOOP-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HADOOP-2342: -- Attachment: (was: throughput.patch) create a micro-benchmark for measure local-file versus hdfs read Key: HADOOP-2342 URL: https://issues.apache.org/jira/browse/HADOOP-2342 Project: Hadoop Issue Type: Test Components: dfs Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.16.0 Attachments: throughput.patch We should have a benchmark that measures reading a 10g file from hdfs and from local disk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2359) PendingReplicationMonitor thread received exception. java.lang.InterruptedException
[ https://issues.apache.org/jira/browse/HADOOP-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548847 ] Hadoop QA commented on HADOOP-2359: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12371067/replicationWarning.patch against trunk revision r601518. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1275/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1275/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1275/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1275/console This message is automatically generated. PendingReplicationMonitor thread received exception. java.lang.InterruptedException --- Key: HADOOP-2359 URL: https://issues.apache.org/jira/browse/HADOOP-2359 Project: Hadoop Issue Type: Bug Components: dfs Affects Versions: 0.16.0 Reporter: Owen O'Malley Assignee: dhruba borthakur Fix For: 0.16.0 Attachments: replicationWarning.patch I sometimes get the message: 07/12/05 19:01:36 WARN fs.FSNamesystem: PendingReplicationMonitor thread received exception. java.lang.InterruptedException: sleep interrupted from mini-dfs cluster. InterruptedExceptions should be handled quietly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-4) tool to mount dfs on linux
[ https://issues.apache.org/jira/browse/HADOOP-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548852 ] Anurag Sharma commented on HADOOP-4: hi Doug. ok :- ), we will follow one of the alternate options you suggested of hosting either the patch or the jar file ourselves, and fixing the fuse-j-hadoop package build to work with this. Will re-submit our changes soon. -thanks, -anurag tool to mount dfs on linux -- Key: HADOOP-4 URL: https://issues.apache.org/jira/browse/HADOOP-4 Project: Hadoop Issue Type: Improvement Components: fs Affects Versions: 0.5.0 Environment: linux only Reporter: John Xing Assignee: Doug Cutting Attachments: fuse-hadoop-0.1.0_fuse-j.2.2.3_hadoop.0.5.0.tar.gz, fuse-hadoop-0.1.0_fuse-j.2.4_hadoop.0.5.0.tar.gz, fuse-hadoop-0.1.1.tar.gz, fuse-j-hadoopfs-0.1.zip, fuse-j-patch.zip tool to mount dfs on linux -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-4) tool to mount dfs on linux
[ https://issues.apache.org/jira/browse/HADOOP-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anurag Sharma updated HADOOP-4: --- Attachment: (was: fuse-j-patch.zip) tool to mount dfs on linux -- Key: HADOOP-4 URL: https://issues.apache.org/jira/browse/HADOOP-4 Project: Hadoop Issue Type: Improvement Components: fs Affects Versions: 0.5.0 Environment: linux only Reporter: John Xing Assignee: Doug Cutting Attachments: fuse-hadoop-0.1.0_fuse-j.2.2.3_hadoop.0.5.0.tar.gz, fuse-hadoop-0.1.0_fuse-j.2.4_hadoop.0.5.0.tar.gz, fuse-hadoop-0.1.1.tar.gz, fuse-j-hadoopfs-0.1.zip tool to mount dfs on linux -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2329) [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data
[ https://issues.apache.org/jira/browse/HADOOP-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548855 ] Bryan Duxbury commented on HADOOP-2329: --- I don't think there should be a type field. That's up to the application to deal with. It would add a ton of overhead to everything in HBase and require a huge overhaul of how stuff works. It would also take away a good deal of flexibility. The fact that the shell cannot understand user-supplied key/value based data types is not a good motivation for adding it. The shell should really only be a administrative utility anyway, just enough to be able create and drop tables and to peek at a row here or there. I doubt that people who write their applications to use HBase are going to be limited by the lack of built-in data types. [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data Key: HADOOP-2329 URL: https://issues.apache.org/jira/browse/HADOOP-2329 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Edward Yoon Assignee: Edward Yoon Fix For: 0.16.0 A built-in data type is a fundamental data type that the hbase shell defines. (character strings, scalars, ranges, arrays, ... , etc) If you need a specialized data type that is not currently provided as a built-in type, you are encouraged to write your own user-defined data type using UDC(not yet implemented). (or contribute it for distribution in a future release of hbase shell) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Reprioritize HBase issues in JIRA
Hey all, It seems to me like there's a lot of mis-prioritized HBase issues in the JIRA at the moment, since the default is Major. I'd like to give it a once-over and reprioritize the tickets, if no one objects. I think it would make our project easier to assess at a glance. -Bryan Duxbury
[jira] Created: (HADOOP-2361) hadoop version wrong in 0.15.1
hadoop version wrong in 0.15.1 -- Key: HADOOP-2361 URL: https://issues.apache.org/jira/browse/HADOOP-2361 Project: Hadoop Issue Type: Bug Components: build Affects Versions: 0.15.1 Reporter: lohit vijayarenu I downloaded 0.15.1 release, recompiled and executed ./bin/hadoop version. It says 0.15.2-dev picking it from build.xml -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2328) [Hbase Shell] Non-index join columns
[ https://issues.apache.org/jira/browse/HADOOP-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Duxbury updated HADOOP-2328: -- Priority: Trivial (was: Major) [Hbase Shell] Non-index join columns Key: HADOOP-2328 URL: https://issues.apache.org/jira/browse/HADOOP-2328 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Edward Yoon Assignee: Edward Yoon Priority: Trivial Fix For: 0.16.0 Attachments: 2328.patch, 2328_v02.patch If we don't have an index for a domain in the join, we can still improve on the nested-loop join using sort join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2283) [hbase] Stuck replay of failed regionserver edits
[ https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548674 ] Hadoop QA commented on HADOOP-2283: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12371004/compaction.patch against trunk revision r601232. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1271/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1271/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1271/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1271/console This message is automatically generated. [hbase] Stuck replay of failed regionserver edits - Key: HADOOP-2283 URL: https://issues.apache.org/jira/browse/HADOOP-2283 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Assignee: stack Fix For: 0.16.0 Attachments: compaction.patch, OP_READ.patch Looking in master for a cluster of ~90 regionservers, the regionserver carrying the ROOT went down (because it hadn't talked to the master in 30 seconds). Master notices the downed regionserver because its lease timesout. It then goes to run the shutdown server sequence only splitting the regionserver's edit log, it gets stuck trying to split the second of three log files. Eventually, after ~5minutes, the second log split throws: 34974 2007-11-26 01:21:23,999 WARN hbase.HMaster - Processing pending operations: ProcessServerShutdown of XX.XX.XX.XX:60020 34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client XX.XX.XX.XX because current leaseholder is trying to recreate file. 34976 at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848) 34977 at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804) 34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276) 34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) 34980 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 34981 at java.lang.reflect.Method.invoke(Method.java:597) 34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) 34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) 34984 34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 34986 at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) 34987 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) 34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513) 34989 at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82) 34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094) And so on every 5 minutes. Because the regionserver that went down had ROOT region, and because we are stuck in this eternal loop, ROOT never gets reallocated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2264) Support an OutputFormat for HQL row data
[ https://issues.apache.org/jira/browse/HADOOP-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Duxbury updated HADOOP-2264: -- Priority: Trivial (was: Major) Not major issue. Support an OutputFormat for HQL row data Key: HADOOP-2264 URL: https://issues.apache.org/jira/browse/HADOOP-2264 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Reporter: Paul Saab Assignee: Edward Yoon Priority: Trivial Currently when selecting a row, if the data does not convert to a String the hbase shell will print garbage. It would be nice if HQL supported a mechanism to format individual columns. Something along the lines of: select col1: format(SomeFormatClass), col2: format(AnotherFormatClass) from table -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2265) [Hbase Shell] Addition of LIKE operator for a select-condition
[ https://issues.apache.org/jira/browse/HADOOP-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Duxbury updated HADOOP-2265: -- Priority: Trivial (was: Major) Not a major issue. [Hbase Shell] Addition of LIKE operator for a select-condition -- Key: HADOOP-2265 URL: https://issues.apache.org/jira/browse/HADOOP-2265 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Affects Versions: 0.15.0 Reporter: Edward Yoon Assignee: Edward Yoon Priority: Trivial Fix For: 0.16.0 The LIKE operator is used in character string comparisons with pattern matching. With the LIKE operator, you can compare a value to a pattern rather than to a constant. SYNTAX : {code} [NOT] LIKE 'character' [ESCAPE 'character'] {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2143) [Hbase Shell] Cell-value index option using lucene.
[ https://issues.apache.org/jira/browse/HADOOP-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Duxbury updated HADOOP-2143: -- Priority: Trivial (was: Major) Not a major issue. [Hbase Shell] Cell-value index option using lucene. --- Key: HADOOP-2143 URL: https://issues.apache.org/jira/browse/HADOOP-2143 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Affects Versions: 0.14.3 Environment: all environments Reporter: Edward Yoon Assignee: Edward Yoon Priority: Trivial Fix For: 0.16.0 value, row-key1, row-key2[, row-key3] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2329) [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data
[ https://issues.apache.org/jira/browse/HADOOP-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548869 ] Edward Yoon commented on HADOOP-2329: - The shell should really only be a administrative utility anyway, just enough to be able create and drop tables and to peek at a row here or there. I don't think so. What do you think about this ment. - The shell should really only be a administrative utility anyway, just enough to be *reboot* and *dir* and to peek at a *file name* here or there. It would add a ton of overhead to everything in HBase and require a huge overhaul of how stuff works. It would also take away a good deal of flexibility. I don't think so. you can just use the byte[]. ok? Also, applications developers need to modeling capacity on Hbase. (It's very difficult in my experience, so shell's guide will be very useful.) I doubt that people who write their applications to use HBase are going to be limited by the lack of built-in data types. I don't think so. If you are studied Database and Math, you can really powerful use the Some DB solutions. But, many peoples(application developers) can't. Why...? More think please. [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data Key: HADOOP-2329 URL: https://issues.apache.org/jira/browse/HADOOP-2329 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Edward Yoon Assignee: Edward Yoon Priority: Trivial Fix For: 0.16.0 A built-in data type is a fundamental data type that the hbase shell defines. (character strings, scalars, ranges, arrays, ... , etc) If you need a specialized data type that is not currently provided as a built-in type, you are encouraged to write your own user-defined data type using UDC(not yet implemented). (or contribute it for distribution in a future release of hbase shell) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: Reprioritize HBase issues in JIRA
I think this is a very subjective judgment of some application ability. -- B. Regards, Edward yoon @ NHN, corp. Home : http://www.udanax.org From: [EMAIL PROTECTED] To: hadoop-dev@lucene.apache.org Date: Wed, 5 Dec 2007 15:01:41 -0800 Subject: RE: Reprioritize HBase issues in JIRA Yes, that would be a big help. Go for it! And thanks for the help. --- Jim Kellerman, Senior Engineer; Powerset -Original Message- From: Bryan Duxbury [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 05, 2007 2:59 PM To: hadoop-dev@lucene.apache.org Subject: Reprioritize HBase issues in JIRA Hey all, It seems to me like there's a lot of mis-prioritized HBase issues in the JIRA at the moment, since the default is Major. I'd like to give it a once-over and reprioritize the tickets, if no one objects. I think it would make our project easier to assess at a glance. -Bryan Duxbury _ Your smile counts. The more smiles you share, the more we donate. Join in. www.windowslive.com/smile?ocid=TXT_TAGLM_Wave2_oprsmilewlhmtagline
[jira] Commented: (HADOOP-2339) [Hbase Shell] Delete command with no WHERE clause
[ https://issues.apache.org/jira/browse/HADOOP-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548871 ] Bryan Duxbury commented on HADOOP-2339: --- To clarify this issue, are we talking about what is essentially an ALTER TABLE DROP COLUMN in SQL? If so, the description should be changed to reflect that. [Hbase Shell] Delete command with no WHERE clause - Key: HADOOP-2339 URL: https://issues.apache.org/jira/browse/HADOOP-2339 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Edward Yoon Assignee: Edward Yoon Fix For: 0.16.0 Attachments: 2339.patch, 2339_v02.patch, 2339_v03.patch, 2339_v04.patch using HbaseAdmin.deleteColumn() method. {code} DELETE column_name FROM table_name; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2339) [Hbase Shell] Delete command with no WHERE clause
[ https://issues.apache.org/jira/browse/HADOOP-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Duxbury updated HADOOP-2339: -- Priority: Minor (was: Major) Not a major issue, but should still be looked at. [Hbase Shell] Delete command with no WHERE clause - Key: HADOOP-2339 URL: https://issues.apache.org/jira/browse/HADOOP-2339 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Edward Yoon Assignee: Edward Yoon Priority: Minor Fix For: 0.16.0 Attachments: 2339.patch, 2339_v02.patch, 2339_v03.patch, 2339_v04.patch using HbaseAdmin.deleteColumn() method. {code} DELETE column_name FROM table_name; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2351) [Hbase Shell] If select command returns no result, it doesn't need to show the header information.
[ https://issues.apache.org/jira/browse/HADOOP-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Duxbury updated HADOOP-2351: -- Priority: Trivial (was: Major) Doesn't make a functional change, only cosmetic. Not a major issue. [Hbase Shell] If select command returns no result, it doesn't need to show the header information. -- Key: HADOOP-2351 URL: https://issues.apache.org/jira/browse/HADOOP-2351 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Edward Yoon Assignee: Edward Yoon Priority: Trivial Fix For: 0.16.0 Attachments: 2351.patch {code} hql select * from udanax; +-+-+-+ | Row | Column | Cell | +-+-+-+ 0 row(s) in set. (0.09 sec) hql exit; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2006) Aggregate Functions in select statement
[ https://issues.apache.org/jira/browse/HADOOP-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548874 ] Bryan Duxbury commented on HADOOP-2006: --- This seems like a bad idea. You could have TONS of data, and aggregating it in one place would take forever. If you want to produce aggregate info, you should probably fire off a Map Reduce job, no? Aggregate Functions in select statement --- Key: HADOOP-2006 URL: https://issues.apache.org/jira/browse/HADOOP-2006 Project: Hadoop Issue Type: Sub-task Components: contrib/hbase Affects Versions: 0.14.1 Reporter: Edward Yoon Assignee: Edward Yoon Priority: Minor Fix For: 0.16.0 Aggregation functions on collections of data values: average, minimum, maximum, sum, count. Group rows by value of an columnfamily and apply aggregate function independently to each group of rows. * Grouping columnfamilies ƒ ~function_list~ (Relation) {code} select producer, avg(year) from movieLog_table group by producer {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2361) hadoop version wrong in 0.15.1
[ https://issues.apache.org/jira/browse/HADOOP-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548875 ] lohit vijayarenu commented on HADOOP-2361: -- my bad, looks like i picked 0.15 branch instead of tag. closing this as invalid hadoop version wrong in 0.15.1 -- Key: HADOOP-2361 URL: https://issues.apache.org/jira/browse/HADOOP-2361 Project: Hadoop Issue Type: Bug Components: build Affects Versions: 0.15.1 Reporter: lohit vijayarenu I downloaded 0.15.1 release, recompiled and executed ./bin/hadoop version. It says 0.15.2-dev picking it from build.xml -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-2361) hadoop version wrong in 0.15.1
[ https://issues.apache.org/jira/browse/HADOOP-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lohit vijayarenu resolved HADOOP-2361. -- Resolution: Invalid hadoop version wrong in 0.15.1 -- Key: HADOOP-2361 URL: https://issues.apache.org/jira/browse/HADOOP-2361 Project: Hadoop Issue Type: Bug Components: build Affects Versions: 0.15.1 Reporter: lohit vijayarenu I downloaded 0.15.1 release, recompiled and executed ./bin/hadoop version. It says 0.15.2-dev picking it from build.xml -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2338) [hbase] NPE in master server
[ https://issues.apache.org/jira/browse/HADOOP-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Duxbury updated HADOOP-2338: -- Priority: Critical (was: Major) This is an important issue. [hbase] NPE in master server Key: HADOOP-2338 URL: https://issues.apache.org/jira/browse/HADOOP-2338 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Jim Kellerman Assignee: Jim Kellerman Priority: Critical Fix For: 0.16.0 Attachments: master.log.gz, patch.txt Master gets an NPE after receiving multiple responses from the same server telling the master it has opened a region. {code} 2007-12-02 20:31:37,515 DEBUG hbase.HRegion - Next sequence id for region postlog,img254/577/02suecia024richardburnson0.jpg,1196619667879 is 73377537 2007-12-02 20:31:37,517 INFO hbase.HRegion - region postlog,img254/577/02suecia024richardburnson0.jpg,1196619667879 available 2007-12-02 20:31:39,200 WARN hbase.HRegionServer - Processing message (Retry: 0) java.io.IOException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hbase.HMaster.processMsgs(HMaster.java :1484) at org.apache.hadoop.hbase.HMaster.regionServerReport(HMaster.java:1423) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java :25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) at sun.reflect.NativeConstructorAccessorImpl.newInstance0 (Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java :27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82) at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException (RemoteExceptionHandler.java:48) at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:759) at java.lang.Thread.run(Thread.java:619) case HMsg.MSG_REPORT_PROCESS_OPEN: synchronized ( this.assignAttempts) { // Region server has acknowledged request to open region. // Extend region open time by 1/2 max region open time. **1484** assignAttempts.put(region.getRegionName (), Long.valueOf(assignAttempts.get( region.getRegionName()).longValue() + (this.maxRegionOpenTime / 2))); } break; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2329) [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data
[ https://issues.apache.org/jira/browse/HADOOP-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548877 ] Jim Kellerman commented on HADOOP-2329: --- One of the stated goals of the HBase project is to produce a system as similar to Bigtable as possible (see http://wiki.apache.org/lucene-hadoop/Hbase#goals). In this spirit, HBase will remain typeless and it is likely that we will go ahead with HADOOP-2334 (making row keys WritableComparable instead of Text) once we get a chance to breathe after getting out from under the major bugs. [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data Key: HADOOP-2329 URL: https://issues.apache.org/jira/browse/HADOOP-2329 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Edward Yoon Assignee: Edward Yoon Priority: Trivial Fix For: 0.16.0 A built-in data type is a fundamental data type that the hbase shell defines. (character strings, scalars, ranges, arrays, ... , etc) If you need a specialized data type that is not currently provided as a built-in type, you are encouraged to write your own user-defined data type using UDC(not yet implemented). (or contribute it for distribution in a future release of hbase shell) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.
[ https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HADOOP-2185: - Resolution: Fixed Status: Resolved (was: Patch Available) I just committed this. Thanks Konstantin! Server ports: to roll or not to roll. - Key: HADOOP-2185 URL: https://issues.apache.org/jira/browse/HADOOP-2185 Project: Hadoop Issue Type: Improvement Components: conf, dfs, mapred Affects Versions: 0.15.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Fix For: 0.16.0 Attachments: FixedPorts3.patch, FixedPorts4.patch, port.stack Looked at the issues related to port rolling. My impression is that port rolling is required only for the unit tests to run. Even the name-node port should roll there, which we don't have now, in order to be able to start 2 cluster for testing say dist cp. For real clusters on the contrary port rolling is not desired and some times even prohibited. So we should have a way of to ban port rolling. My proposition is to # use ephemeral port 0 if port rolling is desired # if a specific port is specified then port rolling should not happen at all, meaning that a server is either able or not able to start on that particular port. The desired port is specified via configuration parameters. - Name-node: fs.default.name = host:port - Data-node: dfs.datanode.port - Job-tracker: mapred.job.tracker = host:port - Task-tracker: mapred.task.tracker.report.bindAddress = host Task-tracker currently does not have an option to specify port, it always uses the ephemeral port 0, and therefore I propose to add one. - Secondary node does not need a port to listen on. For info servers we have two sets of config variables *.info.bindAddress and *.info.port except for the task tracker, which calls them *.http.bindAddress and *.http.port instead of info. With respect to the info servers I propose to completely eliminate the port parameters, and form *.info.bindAddress = host:port Info servers should do the same thing, namely start or fail on the specified port if it is not 0, and start on any free port if it is ephemeral. For the task-tracker I would rename tasktracker.http.bindAddress to mapred.task.tracker.info.bindAddress For the data-node the info dfs.datanode.info.bindAddress should be included into the default config. Is there a reason why it is not there? This is the summary of proposed changes: || Server || current name = value || proposed name = value || | NameNode | fs.default.name = host:port | same | | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port | | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = host:port | | | dfs.datanode.port = port | eliminate | | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = host:port | | | dfs.datanode.info.port = port | eliminate | | JobTracker | mapred.job.tracker = host:port | same | | | mapred.job.tracker.info.bindAddress = host | mapred.job.tracker.http.bindAddress = host:port | | | mapred.job.tracker.info.port = port | eliminate | | TaskTracker | mapred.task.tracker.report.bindAddress = host | mapred.task.tracker.report.bindAddress = host:port | | | tasktracker.http.bindAddress = host | mapred.task.tracker.http.bindAddress = host:port | | | tasktracker.http.port = port | eliminate | | SecondaryNameNode | dfs.secondary.info.bindAddress = host | dfs.secondary.http.bindAddress = host:port | | | dfs.secondary.info.port = port | eliminate | Do we also want to set some uniform naming convention for the configuration variables? Like having hdfs instead of dfs, or info instead of http, or systematically using either datanode or data.node would make that look better in my opinion. So these are all +*api*+ changes. I would +*really*+ like some feedback on this, especially from people who deal with configuration issues on practice. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2329) [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data
[ https://issues.apache.org/jira/browse/HADOOP-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Duxbury updated HADOOP-2329: -- Priority: Trivial (was: Major) Not a major issue. [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data Key: HADOOP-2329 URL: https://issues.apache.org/jira/browse/HADOOP-2329 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Edward Yoon Assignee: Edward Yoon Priority: Trivial Fix For: 0.16.0 A built-in data type is a fundamental data type that the hbase shell defines. (character strings, scalars, ranges, arrays, ... , etc) If you need a specialized data type that is not currently provided as a built-in type, you are encouraged to write your own user-defined data type using UDC(not yet implemented). (or contribute it for distribution in a future release of hbase shell) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2006) Aggregate Functions in select statement
[ https://issues.apache.org/jira/browse/HADOOP-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548879 ] Edward Yoon commented on HADOOP-2006: - I don't understand your comment. Please more explanation for me. Aggregate Functions in select statement --- Key: HADOOP-2006 URL: https://issues.apache.org/jira/browse/HADOOP-2006 Project: Hadoop Issue Type: Sub-task Components: contrib/hbase Affects Versions: 0.14.1 Reporter: Edward Yoon Assignee: Edward Yoon Priority: Minor Fix For: 0.16.0 Aggregation functions on collections of data values: average, minimum, maximum, sum, count. Group rows by value of an columnfamily and apply aggregate function independently to each group of rows. * Grouping columnfamilies ƒ ~function_list~ (Relation) {code} select producer, avg(year) from movieLog_table group by producer {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2012) Periodic verification at the Datanode
[ https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548761 ] Raghu Angadi commented on HADOOP-2012: -- This is particularly vexing when design choices could clearly be made that would avoid these issues. Our initial design did not modify metadata files on Datanode.. that was my preference too. All this stems from the fact that we are modifying these files. Periodic verification at the Datanode - Key: HADOOP-2012 URL: https://issues.apache.org/jira/browse/HADOOP-2012 Project: Hadoop Issue Type: New Feature Components: dfs Reporter: Raghu Angadi Assignee: Raghu Angadi Fix For: 0.16.0 Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch Currently on-disk data corruption on data blocks is detected only when it is read by the client or by another datanode. These errors are detected much earlier if datanode can periodically verify the data checksums for the local blocks. Some of the issues to consider : - How should we check the blocks ( no more often than once every couple of weeks ?) - How do we keep track of when a block was last verfied ( there is a .meta file associcated with each lock ). - What action to take once a corruption is detected - Scanning should be done as a very low priority with rest of the datanode disk traffic in mind. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: Reprioritize HBase issues in JIRA
Yes, that would be a big help. Go for it! And thanks for the help. --- Jim Kellerman, Senior Engineer; Powerset -Original Message- From: Bryan Duxbury [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 05, 2007 2:59 PM To: hadoop-dev@lucene.apache.org Subject: Reprioritize HBase issues in JIRA Hey all, It seems to me like there's a lot of mis-prioritized HBase issues in the JIRA at the moment, since the default is Major. I'd like to give it a once-over and reprioritize the tickets, if no one objects. I think it would make our project easier to assess at a glance. -Bryan Duxbury
[jira] Updated: (HADOOP-496) Expose HDFS as a WebDAV store
[ https://issues.apache.org/jira/browse/HADOOP-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anurag Sharma updated HADOOP-496: - Attachment: (was: fuse-j-patch.zip) Expose HDFS as a WebDAV store - Key: HADOOP-496 URL: https://issues.apache.org/jira/browse/HADOOP-496 Project: Hadoop Issue Type: New Feature Components: dfs Reporter: Michel Tourn Assignee: Enis Soztutar Attachments: hadoop-496-3.patch, hadoop-496-4.patch, hadoop-496-spool-cleanup.patch, hadoop-webdav.zip, jetty-slide.xml, lib.webdav.tar.gz, screenshot-1.jpg, slideusers.properties, webdav_wip1.patch, webdav_wip2.patch WebDAV stands for Distributed Authoring and Versioning. It is a set of extensions to the HTTP protocol that lets users collaboratively edit and manage files on a remote web server. It is often considered as a replacement for NFS or SAMBA HDFS (Hadoop Distributed File System) needs a friendly file system interface. DFSShell commands are unfamiliar. Instead it is more convenient for Hadoop users to use a mountable network drive. A friendly interface to HDFS will be used both for casual browsing of data and for bulk import/export. The FUSE provider for HDFS is already available ( http://issues.apache.org/jira/browse/HADOOP-17 ) but it had scalability problems. WebDAV is a popular alternative. The typical licensing terms for WebDAV tools are also attractive: GPL for Linux client tools that Hadoop would not redistribute anyway. More importantly, Apache Project/Apache license for Java tools and for server components. This allows for a tighter integration with the HDFS code base. There are some interesting Apache projects that support WebDAV. But these are probably too heavyweight for the needs of Hadoop: Tomcat servlet: http://tomcat.apache.org/tomcat-4.1-doc/catalina/docs/api/org/apache/catalina/servlets/WebdavServlet.html Slide: http://jakarta.apache.org/slide/ Being HTTP-based and backwards-compatible with Web Browser clients, the WebDAV server protocol could even be piggy-backed on the existing Web UI ports of the Hadoop name node / data nodes. WebDAV can be hosted as (Jetty) servlets. This minimizes server code bloat and this avoids additional network traffic between HDFS and the WebDAV server. General Clients (read-only): Any web browser Linux Clients: Mountable GPL davfs2 http://dav.sourceforge.net/ FTP-like GPL Cadaver http://www.webdav.org/cadaver/ Server Protocol compliance tests: http://www.webdav.org/neon/litmus/ A goal is for Hadoop HDFS to pass this test (minus support for Properties) Pure Java clients: DAV Explorer Apache lic. http://www.ics.uci.edu/~webdav/ WebDAV also makes it convenient to add advanced features in an incremental fashion: file locking, access control lists, hard links, symbolic links. New WebDAV standards get accepted and more or less featured WebDAV clients exist. core http://www.webdav.org/specs/rfc2518.html ACLs http://www.webdav.org/specs/rfc3744.html redirects soft links http://greenbytes.de/tech/webdav/rfc4437.html BIND hard links http://www.webdav.org/bind/ quota http://tools.ietf.org/html/rfc4331 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2283) [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed regionserver edits)
[ https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HADOOP-2283: -- Priority: Minor (was: Major) Summary: [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed regionserver edits) (was: [hbase] Stuck replay of failed regionserver edits) AlreadyBeingCreatedException was seen last night in a Bryan Duxbury upload (Added ABCE to title) Committed the compaction.patch as part of HADOOP-2357. [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed regionserver edits) - Key: HADOOP-2283 URL: https://issues.apache.org/jira/browse/HADOOP-2283 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Assignee: stack Priority: Minor Fix For: 0.16.0 Attachments: compaction.patch, OP_READ.patch Looking in master for a cluster of ~90 regionservers, the regionserver carrying the ROOT went down (because it hadn't talked to the master in 30 seconds). Master notices the downed regionserver because its lease timesout. It then goes to run the shutdown server sequence only splitting the regionserver's edit log, it gets stuck trying to split the second of three log files. Eventually, after ~5minutes, the second log split throws: 34974 2007-11-26 01:21:23,999 WARN hbase.HMaster - Processing pending operations: ProcessServerShutdown of XX.XX.XX.XX:60020 34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client XX.XX.XX.XX because current leaseholder is trying to recreate file. 34976 at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848) 34977 at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804) 34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276) 34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) 34980 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 34981 at java.lang.reflect.Method.invoke(Method.java:597) 34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) 34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) 34984 34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 34986 at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) 34987 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) 34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513) 34989 at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82) 34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094) And so on every 5 minutes. Because the regionserver that went down had ROOT region, and because we are stuck in this eternal loop, ROOT never gets reallocated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2329) [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data
[ https://issues.apache.org/jira/browse/HADOOP-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548893 ] Edward Yoon commented on HADOOP-2329: - Are you proposing to do the data types entirely outside of HBase or leveraging HADOOP-2197 ? Or do you want internal support for data types? Yes, I'm thinking the former. [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data Key: HADOOP-2329 URL: https://issues.apache.org/jira/browse/HADOOP-2329 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Edward Yoon Assignee: Edward Yoon Priority: Trivial Fix For: 0.16.0 A built-in data type is a fundamental data type that the hbase shell defines. (character strings, scalars, ranges, arrays, ... , etc) If you need a specialized data type that is not currently provided as a built-in type, you are encouraged to write your own user-defined data type using UDC(not yet implemented). (or contribute it for distribution in a future release of hbase shell) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [jira] Commented: (HADOOP-2006) Aggregate Functions in select statement
it will encourage people to think that the shell is a good way to interact with HBase in general... I think this is a key point. :) The Hbase Shell aim is to improve the work's efficiency, without research of specified knowledge. I'll makes an accessory for database access methods on Hbase. Also, i'm thinking about Matrix operations on Hbase. But, ... Hbase Shell just a one of applications on Hbase. ... Let's think. If you mistake that standard sql is all of A-DBMS capacity, I think you don't want to studies about database structure, access algorithms, philosophies,.., etc of A-DBMS. Then, Can i make you use the A-DBMS's 100% Full capacity by force? Or Let's assume the A-DBMS didn't provide standard sql. Are you want to use the A-DBMS? Ok.. If you want to use the A-DBMS, you already didn't thought the sql isn't all of A-DBMS. So, conclusion? The more affluent the hbase shell, the use of hbase will be growing very rapidly. -- B. Regards, Edward yoon @ NHN, corp. Home : http://www.udanax.org From: [EMAIL PROTECTED] Subject: Re: [jira] Commented: (HADOOP-2006) Aggregate Functions in select statement Date: Wed, 5 Dec 2007 15:50:50 -0800 To: hadoop-dev@lucene.apache.org If you have a table with something like a billion rows, and do an aggregate function on the table from the shell, you will end up reading all billion rows through a single machine, essentially aggregating the entire dataset locally. This defeats the purpose of having a massively distributed database like HBase. To do this more efficiently, you'd ideally kick of a Map Reduce job that can perform the various aggregation function on the dataset in parallel, harnessing the power of the distributed dataset, and then returning the results to a central location once they are calculated. I think putting this option into the shell is risky, because it will encourage people to think that the shell is a good way to interact with HBase in general, which it isn't. We want people to understand HBase is best consumed in parallel and discourage solutions that aggregate access through a single point. As such, we shouldn't build features that allow people to inadvertently use the wrong access patterns. On Dec 5, 2007, at 3:38 PM, Edward Yoon (JIRA) wrote: [ https://issues.apache.org/jira/browse/HADOOP-2006? page=com.atlassian.jira.plugin.system.issuetabpanels:comment- tabpanel#action_12548879 ] Edward Yoon commented on HADOOP-2006: - I don't understand your comment. Please more explanation for me. Aggregate Functions in select statement --- Key: HADOOP-2006 URL: https://issues.apache.org/jira/browse/ HADOOP-2006 Project: Hadoop Issue Type: Sub-task Components: contrib/hbase Affects Versions: 0.14.1 Reporter: Edward Yoon Assignee: Edward Yoon Priority: Minor Fix For: 0.16.0 Aggregation functions on collections of data values: average, minimum, maximum, sum, count. Group rows by value of an columnfamily and apply aggregate function independently to each group of rows. * ƒ ~function_list~ (Relation) {code} select producer, avg(year) from movieLog_table group by producer {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. _ Put your friends on the big screen with Windows Vista® + Windows Live™. http://www.microsoft.com/windows/shop/specialoffers.mspx?ocid=TXT_TAGLM_CPC_MediaCtr_bigscreen_102007
[jira] Updated: (HADOOP-2311) [hbase] Could not complete hdfs write out to flush file forcing regionserver restart
[ https://issues.apache.org/jira/browse/HADOOP-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Duxbury updated HADOOP-2311: -- Priority: Critical (was: Minor) Sounds serious. Changing to critical. [hbase] Could not complete hdfs write out to flush file forcing regionserver restart Key: HADOOP-2311 URL: https://issues.apache.org/jira/browse/HADOOP-2311 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Priority: Critical Attachments: delete-logging.patch I've spent some time looking into this issue but there are not enough clues in the logs to tell where the problem is. Here's what I know. Two region servers went down last night, a minute apart, during Paul Saab's 6hr run inserting 300million rows into hbase. The regionservers went down to force rerun of hlog and avoid possible data loss after a failure writing memory flushes to hdfs. Here is the lead up to the failed flush: ... 2007-11-28 22:40:02,231 INFO hbase.HRegionServer - MSG_REGION_OPEN : regionname: postlog,img149/4699/133lm0.jpg,1196318393738, startKey: img149/4699/133lm0.jpg, tableDesc: {name: postlog, families: {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2007-11-28 22:40:02,242 DEBUG hbase.HStore - starting 1703405830/cookie (no reconstruction log) 2007-11-28 22:40:02,741 DEBUG hbase.HStore - maximum sequence id for hstore 1703405830/cookie is 29077708 2007-11-28 22:40:03,094 DEBUG hbase.HStore - starting 1703405830/ip (no reconstruction log) 2007-11-28 22:40:03,852 DEBUG hbase.HStore - maximum sequence id for hstore 1703405830/ip is 29077708 2007-11-28 22:40:04,138 DEBUG hbase.HRegion - Next sequence id for region postlog,img149/4699/133lm0.jpg,1196318393738 is 29077709 2007-11-28 22:40:04,141 INFO hbase.HRegion - region postlog,img149/4699/133lm0.jpg,1196318393738 available 2007-11-28 22:40:04,141 DEBUG hbase.HLog - changing sequence number from 21357623 to 29077709 2007-11-28 22:40:04,141 INFO hbase.HRegionServer - MSG_REGION_OPEN : regionname: postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739, startKey: img149/7512/dscnlightenedfi3.jpg, tableDesc: {name: postlog, families: {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}} 2007-11-28 22:40:04,145 DEBUG hbase.HStore - starting 376748222/cookie (no reconstruction log) 2007-11-28 22:40:04,223 DEBUG hbase.HStore - maximum sequence id for hstore 376748222/cookie is 29077708 2007-11-28 22:40:04,277 DEBUG hbase.HStore - starting 376748222/ip (no reconstruction log) 2007-11-28 22:40:04,353 DEBUG hbase.HStore - maximum sequence id for hstore 376748222/ip is 29077708 2007-11-28 22:40:04,699 DEBUG hbase.HRegion - Next sequence id for region postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 is 29077709 2007-11-28 22:40:04,701 INFO hbase.HRegion - region postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 available 2007-11-28 22:40:34,427 DEBUG hbase.HRegionServer - flushing region postlog,img143/1310/yashrk3.jpg,1196317258704 2007-11-28 22:40:34,428 DEBUG hbase.HRegion - Not flushing cache for region postlog,img143/1310/yashrk3.jpg,1196317258704: snapshotMemcaches() determined that there was nothing to do 2007-11-28 22:40:55,745 DEBUG hbase.HRegionServer - flushing region postlog,img142/8773/1001417zc4.jpg,1196317258703 2007-11-28 22:40:55,745 DEBUG hbase.HRegion - Not flushing cache for region postlog,img142/8773/1001417zc4.jpg,1196317258703: snapshotMemcaches() determined that there was nothing to do 2007-11-28 22:41:04,144 DEBUG hbase.HRegionServer - flushing region postlog,img149/4699/133lm0.jpg,1196318393738 2007-11-28 22:41:04,144 DEBUG hbase.HRegion - Started memcache flush for region postlog,img149/4699/133lm0.jpg,1196318393738. Size 74.7k 2007-11-28 22:41:04,764 DEBUG hbase.HStore - Added 1703405830/ip/610047924323344967 with sequence id 29081563 and size 53.8k 2007-11-28 22:41:04,902 DEBUG hbase.HStore - Added 1703405830/cookie/3147798053949544972 with sequence id 29081563 and size 41.3k 2007-11-28 22:41:04,902 DEBUG hbase.HRegion - Finished memcache flush for region postlog,img149/4699/133lm0.jpg,1196318393738 in 758ms, sequenceid=29081563 2007-11-28 22:41:04,902 DEBUG hbase.HStore - compaction for HStore postlog,img149/4699/133lm0.jpg,1196318393738/ip needed. 2007-11-28 22:41:04,903
[jira] Updated: (HADOOP-1550) [hbase] No means of deleting a'row' nor all members of a column family
[ https://issues.apache.org/jira/browse/HADOOP-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Duxbury updated HADOOP-1550: -- Priority: Major (was: Minor) Seems like this is pretty important for the API to be complete. Elevating to Major. [hbase] No means of deleting a'row' nor all members of a column family -- Key: HADOOP-1550 URL: https://issues.apache.org/jira/browse/HADOOP-1550 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: stack There is no support in hbase currently for deleting a row -- i.e. remove all columns and their versions keyed by a particular row id. Nor is there a means of passing in a row id and column family name having hbase delete all members of the column family (for the designated row). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2243) [hbase] getRow returns empty Map if no-such row.. should return null
[ https://issues.apache.org/jira/browse/HADOOP-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Duxbury updated HADOOP-2243: -- Priority: Major (was: Minor) Results in ambiguous answer about existence of a cell, so elevating to Major. [hbase] getRow returns empty Map if no-such row.. should return null Key: HADOOP-2243 URL: https://issues.apache.org/jira/browse/HADOOP-2243 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Found by Bryan Duxbury. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (HADOOP-2006) Aggregate Functions in select statement
If you have a table with something like a billion rows, and do an aggregate function on the table from the shell, you will end up reading all billion rows through a single machine, essentially aggregating the entire dataset locally. This defeats the purpose of having a massively distributed database like HBase. To do this more efficiently, you'd ideally kick of a Map Reduce job that can perform the various aggregation function on the dataset in parallel, harnessing the power of the distributed dataset, and then returning the results to a central location once they are calculated. I think putting this option into the shell is risky, because it will encourage people to think that the shell is a good way to interact with HBase in general, which it isn't. We want people to understand HBase is best consumed in parallel and discourage solutions that aggregate access through a single point. As such, we shouldn't build features that allow people to inadvertently use the wrong access patterns. On Dec 5, 2007, at 3:38 PM, Edward Yoon (JIRA) wrote: [ https://issues.apache.org/jira/browse/HADOOP-2006? page=com.atlassian.jira.plugin.system.issuetabpanels:comment- tabpanel#action_12548879 ] Edward Yoon commented on HADOOP-2006: - I don't understand your comment. Please more explanation for me. Aggregate Functions in select statement --- Key: HADOOP-2006 URL: https://issues.apache.org/jira/browse/ HADOOP-2006 Project: Hadoop Issue Type: Sub-task Components: contrib/hbase Affects Versions: 0.14.1 Reporter: Edward Yoon Assignee: Edward Yoon Priority: Minor Fix For: 0.16.0 Aggregation functions on collections of data values: average, minimum, maximum, sum, count. Group rows by value of an columnfamily and apply aggregate function independently to each group of rows. * Grouping columnfamilies ƒ ~function_list~ (Relation) {code} select producer, avg(year) from movieLog_table group by producer {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2362) [hbase] Leaking hdfs file handle
[hbase] Leaking hdfs file handle Key: HADOOP-2362 URL: https://issues.apache.org/jira/browse/HADOOP-2362 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Priority: Minor Fix For: 0.16.0 Found a leaking filehandle researching HADOOP-2341. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2351) [Hbase Shell] If select command returns no result, it doesn't need to show the header information.
[ https://issues.apache.org/jira/browse/HADOOP-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548619 ] Hadoop QA commented on HADOOP-2351: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12370992/2351.patch against trunk revision r601232. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1269/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1269/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1269/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1269/console This message is automatically generated. [Hbase Shell] If select command returns no result, it doesn't need to show the header information. -- Key: HADOOP-2351 URL: https://issues.apache.org/jira/browse/HADOOP-2351 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Edward Yoon Assignee: Edward Yoon Fix For: 0.16.0 Attachments: 2351.patch {code} hql select * from udanax; +-+-+-+ | Row | Column | Cell | +-+-+-+ 0 row(s) in set. (0.09 sec) hql exit; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2350) hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows
[ https://issues.apache.org/jira/browse/HADOOP-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Bieniosek updated HADOOP-2350: -- Priority: Critical (was: Major) Bump priority because it is a correctness issue hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows --- Key: HADOOP-2350 URL: https://issues.apache.org/jira/browse/HADOOP-2350 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: Michael Bieniosek Assignee: stack Priority: Critical Fix For: 0.16.0 Attachments: TestScannerAPI.java I'm attaching a test case that fails. I noticed that if I create a table with two column families, and start a scanner on a row which only has an entry for one column family, the scanner will skip ahead to the row name for which the other column family has an entry. eg., if I insert rows so my table will look like this: {code} row - a:a - b:b aaa a:1 nil bbb a:2 b:2 ccc a:3 b:3 {code} The scanner will tell me my table looks something like this: {code} row - a:a - b:b bbb a:1 b:2 bbb a:2 b:3 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2355) Set region split size on table creation
Set region split size on table creation --- Key: HADOOP-2355 URL: https://issues.apache.org/jira/browse/HADOOP-2355 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Reporter: Paul Saab Priority: Minor Right now the region size before a split is determined by a global configuration. It would be nice to configure tables independently of the global parameter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2363) Unit tests fail if there is another instance of Hadoop
Unit tests fail if there is another instance of Hadoop -- Key: HADOOP-2363 URL: https://issues.apache.org/jira/browse/HADOOP-2363 Project: Hadoop Issue Type: Bug Components: test Reporter: Raghu Angadi Assignee: Konstantin Shvachko If you are running another Hadoop cluster or DFS, many unit tests fail because Namenode in MiniDFSCluster fails to bind to the right port. Most likely HADOOP-2185 forgot to set right defaults for MiniDFSCluster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2329) [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data
[ https://issues.apache.org/jira/browse/HADOOP-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548881 ] Edward Yoon commented on HADOOP-2329: - OK, i see jim. But, i don't know the movements opposed to shell operations. :) I think there can be no cause for complaint. The shell tool isn't threatening a pure Hbase. [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data Key: HADOOP-2329 URL: https://issues.apache.org/jira/browse/HADOOP-2329 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Edward Yoon Assignee: Edward Yoon Priority: Trivial Fix For: 0.16.0 A built-in data type is a fundamental data type that the hbase shell defines. (character strings, scalars, ranges, arrays, ... , etc) If you need a specialized data type that is not currently provided as a built-in type, you are encouraged to write your own user-defined data type using UDC(not yet implemented). (or contribute it for distribution in a future release of hbase shell) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Reprioritize HBase issues in JIRA
I don't mean to ruffle any feathers. I just want to make sure that the really critical issues are labeled as such. In order to try and clarify what I think the priorities should mean, here's a wiki page I put together: http://wiki.apache.org/lucene-hadoop/Hbase/ IssuePriorityGuidelines#preview If I'm way off base on these categories, let me know. -Bryan On Dec 5, 2007, at 3:18 PM, edward yoon wrote: I think this is a very subjective judgment of some application ability. -- B. Regards, Edward yoon @ NHN, corp. Home : http://www.udanax.org From: [EMAIL PROTECTED] To: hadoop-dev@lucene.apache.org Date: Wed, 5 Dec 2007 15:01:41 -0800 Subject: RE: Reprioritize HBase issues in JIRA Yes, that would be a big help. Go for it! And thanks for the help. --- Jim Kellerman, Senior Engineer; Powerset -Original Message- From: Bryan Duxbury [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 05, 2007 2:59 PM To: hadoop-dev@lucene.apache.org Subject: Reprioritize HBase issues in JIRA Hey all, It seems to me like there's a lot of mis-prioritized HBase issues in the JIRA at the moment, since the default is Major. I'd like to give it a once-over and reprioritize the tickets, if no one objects. I think it would make our project easier to assess at a glance. -Bryan Duxbury _ Your smile counts. The more smiles you share, the more we donate. Join in. www.windowslive.com/smile?ocid=TXT_TAGLM_Wave2_oprsmilewlhmtagline
[jira] Updated: (HADOOP-2362) [hbase] Leaking hdfs file handle
[ https://issues.apache.org/jira/browse/HADOOP-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HADOOP-2362: -- Attachment: 2362.patch HADOOP-2362 Leaking hdfs file handle M src/contrib/hbase/src/test/org/apache/hadoop/hbase/TestScanner2.java HRegion.createHRegion API changed. M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStoreFile.java (obtainNewHStoreFile): Remove duplicated code. (writeInfo): No need to wrap FSDataOutputStream in a DataOutputStream. M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStore.java No need to wrap FSDataOutputStream in a DataOutputStream. M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HRegionServer.java Remove useless log. Do explicit imports instead of importing whole packages. M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMaster.java HRegion.createHRegion API changed. M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HServerInfo.java M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HServerAddress.java M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HAbstractScanner.java M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMsg.java Do explicit imports instead of importing whole packages. M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HRegion.java Close daughter regions after opening them in split. (createHRegion): No need of initialFiles argument. [hbase] Leaking hdfs file handle Key: HADOOP-2362 URL: https://issues.apache.org/jira/browse/HADOOP-2362 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Priority: Minor Fix For: 0.16.0 Attachments: 2362.patch Found a leaking filehandle researching HADOOP-2341. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2342) create a micro-benchmark for measure local-file versus hdfs read
[ https://issues.apache.org/jira/browse/HADOOP-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548887 ] Hadoop QA commented on HADOOP-2342: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12371075/throughput.patch against trunk revision r601518. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1276/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1276/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1276/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1276/console This message is automatically generated. create a micro-benchmark for measure local-file versus hdfs read Key: HADOOP-2342 URL: https://issues.apache.org/jira/browse/HADOOP-2342 Project: Hadoop Issue Type: Test Components: dfs Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.16.0 Attachments: throughput.patch We should have a benchmark that measures reading a 10g file from hdfs and from local disk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-1707) Remove the DFS Client disk-based cache
[ https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HADOOP-1707: - Attachment: clientDiskBuffer11.patch Merged with latest trunk. Remove the DFS Client disk-based cache -- Key: HADOOP-1707 URL: https://issues.apache.org/jira/browse/HADOOP-1707 Project: Hadoop Issue Type: Improvement Components: dfs Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.16.0 Attachments: clientDiskBuffer.patch, clientDiskBuffer10.patch, clientDiskBuffer11.patch, clientDiskBuffer2.patch, clientDiskBuffer6.patch, clientDiskBuffer7.patch, clientDiskBuffer8.patch, clientDiskBuffer9.patch, DataTransferProtocol.doc, DataTransferProtocol.html The DFS client currently uses a staging file on local disk to cache all user-writes to a file. When the staging file accumulates 1 block worth of data, its contents are flushed to a HDFS datanode. These operations occur sequentially. A simple optimization of allowing the user to write to another staging file while simultaneously uploading the contents of the first staging file to HDFS will improve file-upload performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2362) [hbase] Leaking hdfs file handle
[ https://issues.apache.org/jira/browse/HADOOP-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HADOOP-2362: -- Status: Patch Available (was: Open) Tests pass locally. [hbase] Leaking hdfs file handle Key: HADOOP-2362 URL: https://issues.apache.org/jira/browse/HADOOP-2362 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Priority: Minor Fix For: 0.16.0 Attachments: 2362.patch Found a leaking filehandle researching HADOOP-2341. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2329) [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data
[ https://issues.apache.org/jira/browse/HADOOP-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_1254 ] Jim Kellerman commented on HADOOP-2329: --- Edward, But, i don't know the movements opposed to shell operations. I don't think there is opposition to what you are doing, other than some people feel that the advanced shell operations are not necessary in a basic shell that can do simple queries and administrative functions. If the advanced features could be packaged in a separate jar and loaded via some command line option, I think it would gain higher acceptance. I think there can be no cause for complaint. The shell tool isn't threatening a pure Hbase. I think I am misunderstanding something here. Are you proposing to do the data types entirely outside of HBase or leveraging HADOOP-2197 ? Or do you want internal support for data types? If you are thinking of the former, that's fine. But I don't think support for data types should be in the core of HBase. [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data Key: HADOOP-2329 URL: https://issues.apache.org/jira/browse/HADOOP-2329 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Edward Yoon Assignee: Edward Yoon Priority: Trivial Fix For: 0.16.0 A built-in data type is a fundamental data type that the hbase shell defines. (character strings, scalars, ranges, arrays, ... , etc) If you need a specialized data type that is not currently provided as a built-in type, you are encouraged to write your own user-defined data type using UDC(not yet implemented). (or contribute it for distribution in a future release of hbase shell) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2343) [hbase] Stuck regionserver?
[ https://issues.apache.org/jira/browse/HADOOP-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Duxbury updated HADOOP-2343: -- Priority: Major (was: Minor) Affects cluster stability, but cluster recovers on restart, so changing to major. [hbase] Stuck regionserver? --- Key: HADOOP-2343 URL: https://issues.apache.org/jira/browse/HADOOP-2343 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Assignee: stack Looking in logs, a regionserver went down because it could not contact the master after 60 seconds. Watching logging, the HRS is repeatedly checking all 150 loaded regions over and over again w/ a pause of about 5 seconds between runs... then there is a suspicious 60+ second gap with no logging as though the regionserver had hung up on something: {code} 2007-12-03 13:14:54,178 DEBUG hbase.HRegionServer - flushing region postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635 2007-12-03 13:14:54,178 DEBUG hbase.HRegion - Not flushing cache for region postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635: snapshotMemcaches() determined that there was nothing to do 2007-12-03 13:14:54,205 DEBUG hbase.HRegionServer - flushing region postlog,img247/230/seanpaul4li.jpg,1196615889965 2007-12-03 13:14:54,205 DEBUG hbase.HRegion - Not flushing cache for region postlog,img247/230/seanpaul4li.jpg,1196615889965: snapshotMemcaches() determined that there was nothing to do 2007-12-03 13:16:04,305 FATAL hbase.HRegionServer - unable to report to master for 67467 milliseconds - aborting server 2007-12-03 13:16:04,455 INFO hbase.Leases - regionserver/0:0:0:0:0:0:0:0:60020 closing leases 2007-12-03 13:16:04,455 INFO hbase.Leases$LeaseMonitor - regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker exiting {code} Master seems to be running fine scanning its ~700 regions. Then you see this in log, before the HRS shuts itself down. {code} 2007-12-03 13:14:31,416 INFO hbase.Leases - HMaster.leaseChecker lease expired 153260899/1532608992007-12-03 13:14:31,417 INFO hbase.HMaster - XX.XX.XX.102:60020 lease expired {code} ... and we go on to process shutdown. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HADOOP-2364) when hbase regionserver restarts, it says impossible state for createLease()
when hbase regionserver restarts, it says impossible state for createLease() -- Key: HADOOP-2364 URL: https://issues.apache.org/jira/browse/HADOOP-2364 Project: Hadoop Issue Type: Bug Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Michael Bieniosek Priority: Minor I restarted a regionserver, and got this error in its logs: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.AssertionError: Impossible state for createLease(): Lease -435227488/-435227488 is still held. at org.apache.hadoop.hbase.Leases.createLease(Leases.java:145) at org.apache.hadoop.hbase.HMaster.regionServerStartup(HMaster.java:1278 ) at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) at org.apache.hadoop.ipc.Client.call(Client.java:482) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184) at $Proxy0.regionServerStartup(Unknown Source) at org.apache.hadoop.hbase.HRegionServer.reportForDuty(HRegionServer.jav a:1025) at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:659) at java.lang.Thread.run(Unknown Source) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2329) [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data
[ https://issues.apache.org/jira/browse/HADOOP-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548906 ] Jim Kellerman commented on HADOOP-2329: --- Since you are proposing the former rather than the latter, I would say go for it. [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data Key: HADOOP-2329 URL: https://issues.apache.org/jira/browse/HADOOP-2329 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Edward Yoon Assignee: Edward Yoon Priority: Trivial Fix For: 0.16.0 A built-in data type is a fundamental data type that the hbase shell defines. (character strings, scalars, ranges, arrays, ... , etc) If you need a specialized data type that is not currently provided as a built-in type, you are encouraged to write your own user-defined data type using UDC(not yet implemented). (or contribute it for distribution in a future release of hbase shell) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2362) [hbase] Leaking hdfs file handle
[ https://issues.apache.org/jira/browse/HADOOP-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548907 ] Hadoop QA commented on HADOOP-2362: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12371086/2362.patch against trunk revision r601518. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1277/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1277/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1277/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1277/console This message is automatically generated. [hbase] Leaking hdfs file handle Key: HADOOP-2362 URL: https://issues.apache.org/jira/browse/HADOOP-2362 Project: Hadoop Issue Type: Bug Components: contrib/hbase Reporter: stack Priority: Minor Fix For: 0.16.0 Attachments: 2362.patch Found a leaking filehandle researching HADOOP-2341. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-2329) [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data
[ https://issues.apache.org/jira/browse/HADOOP-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548908 ] Edward Yoon commented on HADOOP-2329: - Thanks for your advice. [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data Key: HADOOP-2329 URL: https://issues.apache.org/jira/browse/HADOOP-2329 Project: Hadoop Issue Type: New Feature Components: contrib/hbase Affects Versions: 0.16.0 Reporter: Edward Yoon Assignee: Edward Yoon Priority: Trivial Fix For: 0.16.0 A built-in data type is a fundamental data type that the hbase shell defines. (character strings, scalars, ranges, arrays, ... , etc) If you need a specialized data type that is not currently provided as a built-in type, you are encouraged to write your own user-defined data type using UDC(not yet implemented). (or contribute it for distribution in a future release of hbase shell) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-1707) Remove the DFS Client disk-based cache
[ https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548911 ] Konstantin Shvachko commented on HADOOP-1707: - I think this patch has been tested quite thoroughly, and I don't see any algorithmic flaws in it. The logic is fairly complicated though, so imo # we need better documentation either in JavaDoc or at least in Jira. # it would be good if you could extract common actions for the client and the data-node into separate classes, not inner ones. === DFSClient.java - DFSClient: 4 unused variables, members. - DFSOutputStream.lb should be local variable. - processDatanodeError() and DFSOutputStream.close() have common code. - BlockReader.readChunk() {code} 07/12/04 18:36:22 INFO fs.FSInputChecker: DFSClient readChunk got seqno 14 offsetInBlock 7168 {code} Should be DEBUG. - More comments: What is e.g. dataQueue, ackQueue, bytesCurBlock? - Some new members in DFSOutputStream can be calculated from the other. No need to store them all. See e.g. {code} private int packetSize = 0; private int chunksPerPacket = 0; private int chunksPerBlock = 0; private int chunkSize = 0; {code} - In the line below 8 should be defined as a constant. Otherwise, the meaning of that is not clear. {code} chunkSize = bytesPerChecksum + 8; // user data + checksum {code} - currentPacket should be a local variable of writeChunk() - The 4 in the code snippet below looks misterious: {code} if (len + cklen + 4 chunkSize) { {code} - why start ResponseProcessor in processDatanodeError() - some methods should be moved into new inner classes, like nextBlockOutputStream() should be a part of DataStreamer - Packet should be factored out to a separate class (named probably DataPacket). It should have serialization/deserialization methods for packet header, which should be reused in DFSClient and DataNodes for consistency in data transfer. It also should have methods readPacker() and writePacket() === DataNode.java - import org.apache.hadoop.io.Text; is redundant. - My Eclipse shows 5 variables that are never read. - Rather than using 4 on several occasions a constant should be defined {code} SIZE_OF_INTEGER = Integer.SIZE / Byte.SIZE; {code} and used whenever required. - lastDataNodeRun() should not be public === FSDataset.java - writeToBlock(): These are two searches in a map instead of one. {code} if (ongoingCreates.containsKey(b)) { ActiveFile activeFile = ongoingCreates.get(b); {code} - unfinalizeBlock() I kinda find the name funny. === General - Convert comments like // .. to JavaDoc /** ... */ style comments when used as method or class headers even if they are private. - Formatting. Tabs should be replaced by 2 spaces. Eg: ResponseProcessor.run(), DataStreamer.run() - Formatting. Long lines. Remove the DFS Client disk-based cache -- Key: HADOOP-1707 URL: https://issues.apache.org/jira/browse/HADOOP-1707 Project: Hadoop Issue Type: Improvement Components: dfs Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.16.0 Attachments: clientDiskBuffer.patch, clientDiskBuffer10.patch, clientDiskBuffer11.patch, clientDiskBuffer2.patch, clientDiskBuffer6.patch, clientDiskBuffer7.patch, clientDiskBuffer8.patch, clientDiskBuffer9.patch, DataTransferProtocol.doc, DataTransferProtocol.html The DFS client currently uses a staging file on local disk to cache all user-writes to a file. When the staging file accumulates 1 block worth of data, its contents are flushed to a HDFS datanode. These operations occur sequentially. A simple optimization of allowing the user to write to another staging file while simultaneously uploading the contents of the first staging file to HDFS will improve file-upload performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [jira] Commented: (HADOOP-2006) Aggregate Functions in select statement
Sorry for my mistake... If you mistake that standard sql is all of A-DBMS capacity, I think you don't want to studies about database structure, access algorithms, philosophies,.., etc of A-DBMS. Then, Can i make you use the A-DBMS's 100% Full capacity by force? Or Let's assume the A-DBMS didn't provide standard sql. Are you want to use the A-DBMS? Do you want to use the A-DBMS? Ok.. If you want to use the A-DBMS, you already didn't thought the sql isn't all of A-DBMS. If you want to use the A-DBMS, you already thought the sql isn't all of A-DBMS. So, conclusion? The more affluent the hbase shell, the use of hbase will be growing very rapidly. -- B. Regards, Edward yoon @ NHN, corp. Home : http://www.udanax.org From: [EMAIL PROTECTED] To: hadoop-dev@lucene.apache.org Subject: RE: [jira] Commented: (HADOOP-2006) Aggregate Functions in select statement Date: Thu, 6 Dec 2007 01:10:14 + it will encourage people to think that the shell is a good way to interact with HBase in general... I think this is a key point. :) The Hbase Shell aim is to improve the work's efficiency, without research of specified knowledge. I'll makes an accessory for database access methods on Hbase. Also, i'm thinking about Matrix operations on Hbase. But, ... Hbase Shell just a one of applications on Hbase. ... Let's think. If you mistake that standard sql is all of A-DBMS capacity, I think you don't want to studies about database structure, access algorithms, philosophies,.., etc of A-DBMS. Then, Can i make you use the A-DBMS's 100% Full capacity by force? Or Let's assume the A-DBMS didn't provide standard sql. Are you want to use the A-DBMS? Ok.. If you want to use the A-DBMS, you already didn't thought the sql isn't all of A-DBMS. So, conclusion? The more affluent the hbase shell, the use of hbase will be growing very rapidly. -- B. Regards, Edward yoon @ NHN, corp. Home : http://www.udanax.org From: [EMAIL PROTECTED] Subject: Re: [jira] Commented: (HADOOP-2006) Aggregate Functions in select statement Date: Wed, 5 Dec 2007 15:50:50 -0800 To: hadoop-dev@lucene.apache.org If you have a table with something like a billion rows, and do an aggregate function on the table from the shell, you will end up reading all billion rows through a single machine, essentially aggregating the entire dataset locally. This defeats the purpose of having a massively distributed database like HBase. To do this more efficiently, you'd ideally kick of a Map Reduce job that can perform the various aggregation function on the dataset in parallel, harnessing the power of the distributed dataset, and then returning the results to a central location once they are calculated. I think putting this option into the shell is risky, because it will encourage people to think that the shell is a good way to interact with HBase in general, which it isn't. We want people to understand HBase is best consumed in parallel and discourage solutions that aggregate access through a single point. As such, we shouldn't build features that allow people to inadvertently use the wrong access patterns. On Dec 5, 2007, at 3:38 PM, Edward Yoon (JIRA) wrote: [ https://issues.apache.org/jira/browse/HADOOP-2006? page=com.atlassian.jira.plugin.system.issuetabpanels:comment- tabpanel#action_12548879 ] Edward Yoon commented on HADOOP-2006: - I don't understand your comment. Please more explanation for me. Aggregate Functions in select statement --- Key: HADOOP-2006 URL: https://issues.apache.org/jira/browse/ HADOOP-2006 Project: Hadoop Issue Type: Sub-task Components: contrib/hbase Affects Versions: 0.14.1 Reporter: Edward Yoon Assignee: Edward Yoon Priority: Minor Fix For: 0.16.0 Aggregation functions on collections of data values: average, minimum, maximum, sum, count. Group rows by value of an columnfamily and apply aggregate function independently to each group of rows. * ƒ ~function_list~ (Relation) {code} select producer, avg(year) from movieLog_table group by producer {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. _ Put your friends on the big screen with Windows Vista® + Windows Live™. http://www.microsoft.com/windows/shop/specialoffers.mspx?ocid=TXT_TAGLM_CPC_MediaCtr_bigscreen_102007 _ You keep typing, we keep giving. Download Messenger and join the i’m Initiative now. http://im.live.com/messenger/im/home/?source=TAGLM
[jira] Assigned: (HADOOP-1550) [hbase] No means of deleting a'row' nor all members of a column family
[ https://issues.apache.org/jira/browse/HADOOP-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Duxbury reassigned HADOOP-1550: - Assignee: Bryan Duxbury [hbase] No means of deleting a'row' nor all members of a column family -- Key: HADOOP-1550 URL: https://issues.apache.org/jira/browse/HADOOP-1550 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: stack Assignee: Bryan Duxbury There is no support in hbase currently for deleting a row -- i.e. remove all columns and their versions keyed by a particular row id. Nor is there a means of passing in a row id and column family name having hbase delete all members of the column family (for the designated row). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Work started: (HADOOP-1550) [hbase] No means of deleting a'row' nor all members of a column family
[ https://issues.apache.org/jira/browse/HADOOP-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HADOOP-1550 started by Bryan Duxbury. [hbase] No means of deleting a'row' nor all members of a column family -- Key: HADOOP-1550 URL: https://issues.apache.org/jira/browse/HADOOP-1550 Project: Hadoop Issue Type: Improvement Components: contrib/hbase Reporter: stack Assignee: Bryan Duxbury There is no support in hbase currently for deleting a row -- i.e. remove all columns and their versions keyed by a particular row id. Nor is there a means of passing in a row id and column family name having hbase delete all members of the column family (for the designated row). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Jaql: a JSON query language
IBM Almaden is pleased to announce Jaql, a query language for JSON data. An introduction to Jaql and a prototype that integrates Jaql with Hadoop's map/reduce and HBase is available at http://www.jaql.org. A more detailed technical description is forthcoming. Jaql is still an early draft specification, so beware that it is likely to change over the next few months. Enjoy!