date:20071205


[ 
https://issues.apache.org/jira/browse/HADOOP-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548573
 ] 

Edward Yoon commented on HADOOP-2329:
-

{code}
row  bikecar
row1 bike:name   Harley davidson   ...   
 bike:cc 800   ...
 bike:price  23,000 
 bike:price_currency U.S dollar
 ...
{code}

For this case, i'm think some different method for each Type-definition.

 [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing 
 and stroing data
 

 Key: HADOOP-2329
 URL: https://issues.apache.org/jira/browse/HADOOP-2329
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Edward Yoon
Assignee: Edward Yoon
 Fix For: 0.16.0


 A built-in data type is a fundamental data type that the hbase shell defines.
 (character strings, scalars, ranges, arrays, ... , etc)
 If you need a specialized data type that is not currently provided as a 
 built-in type, 
 you are encouraged to write your own user-defined data type using UDC(not yet 
 implemented).
 (or contribute it for distribution in a future release of hbase shell)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2354) Add job-level counters for the launched speculative tasks

2007-12-05 Thread Arun C Murthy (JIRA)

Add job-level counters for the launched speculative tasks
-

 Key: HADOOP-2354
 URL: https://issues.apache.org/jira/browse/HADOOP-2354
 Project: Hadoop
  Issue Type: Improvement
  Components: mapred
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 0.16.0


Add job-level counters for the launched speculative tasks, this should help 
track them. 

Ideally we would also have counters to check how many of the speculative tasks 
completed before the original task (thereby helps validate the strategy for 
launching speculative tasks), however we do not have this infrastructure yet 
(HADOOP-544) - so I'll file a follow-on bug for that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2338) [hbase] NPE in master server


[ 
https://issues.apache.org/jira/browse/HADOOP-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548575
 ] 

Hadoop QA commented on HADOOP-2338:
---

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12370997/patch.txt
against trunk revision r601221.

@author +1.  The patch does not contain any @author tags.

javadoc +1.  The javadoc tool did not generate any warning messages.

javac +1.  The applied patch does not generate any new compiler warnings.

findbugs +1.  The patch does not introduce any new Findbugs warnings.

core tests +1.  The patch passed core unit tests.

contrib tests -1.  The patch failed contrib unit tests.

Test results: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1267/testReport/
Findbugs warnings: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1267/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1267/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1267/console

This message is automatically generated.

 [hbase] NPE in master server
 

 Key: HADOOP-2338
 URL: https://issues.apache.org/jira/browse/HADOOP-2338
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: master.log.gz, patch.txt


 Master gets an NPE after receiving multiple responses from the same server 
 telling the master it has opened a region.
 {code}
 2007-12-02 20:31:37,515 DEBUG hbase.HRegion - Next sequence id for region 
 postlog,img254/577/02suecia024richardburnson0.jpg,1196619667879 is 73377537
 2007-12-02 20:31:37,517 INFO  hbase.HRegion - region 
 postlog,img254/577/02suecia024richardburnson0.jpg,1196619667879 available
 2007-12-02 20:31:39,200 WARN  hbase.HRegionServer - Processing message 
 (Retry: 0)
 java.io.IOException: java.io.IOException: java.lang.NullPointerException
 at org.apache.hadoop.hbase.HMaster.processMsgs(HMaster.java :1484)
 at org.apache.hadoop.hbase.HMaster.regionServerReport(HMaster.java:1423)
 at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java
  :25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0 (Native Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java
  :27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
 at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException 
 (RemoteExceptionHandler.java:48)
 at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:759)
 at java.lang.Thread.run(Thread.java:619)
   case HMsg.MSG_REPORT_PROCESS_OPEN:
 synchronized ( this.assignAttempts) {
   // Region server has acknowledged request to open region.
   // Extend region open time by 1/2 max region open time.
 **1484**  assignAttempts.put(region.getRegionName (), 
   Long.valueOf(assignAttempts.get(
   region.getRegionName()).longValue() +
   (this.maxRegionOpenTime / 2)));
 }
 break;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2349) FSEditLog.logEdit(byte op, Writable w1, Writable w2) should accept variable numbers of Writable, instead of two.

[
https://issues.apache.org/jira/browse/HADOOP-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548607
]

Hadoop QA commented on HADOOP-2349:
---

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12370983/2349_20071204.patch
against trunk revision r601232.

@author +1. The patch does not contain any @author tags.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new compiler warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests -1. The patch failed contrib unit tests.

Test results:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1268/testReport/
Findbugs warnings:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1268/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1268/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1268/console

This message is automatically generated.

FSEditLog.logEdit(byte op, Writable w1, Writable w2) should accept variable
numbers of Writable, instead of two.

Key: HADOOP-2349
URL: https://issues.apache.org/jira/browse/HADOOP-2349
Project: Hadoop
Issue Type: Improvement
Components: dfs
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
Priority: Minor
Attachments: 2349_20071204.patch

The new declaration should be
{code}
FSEditLog.logEdit(byte op, Writable ... w)
{code}
All Writable parameters should not be null.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2339) [Hbase Shell] Delete command with no WHERE clause

[
https://issues.apache.org/jira/browse/HADOOP-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548640
]

Hadoop QA commented on HADOOP-2339:
---

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12370926/2339_v04.patch
against trunk revision r601232.

@author +1. The patch does not contain any @author tags.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new compiler warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests -1. The patch failed contrib unit tests.

Test results:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1270/testReport/
Findbugs warnings:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1270/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1270/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1270/console

This message is automatically generated.

[Hbase Shell] Delete command with no WHERE clause
-

Key: HADOOP-2339
URL: https://issues.apache.org/jira/browse/HADOOP-2339
Project: Hadoop
Issue Type: Improvement
Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Edward Yoon
Assignee: Edward Yoon
Fix For: 0.16.0

Attachments: 2339.patch, 2339_v02.patch, 2339_v03.patch,
2339_v04.patch

using HbaseAdmin.deleteColumn() method.
{code}
DELETE column_name FROM table_name;
{code}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2356) Set memcache flush size per table

2007-12-05 Thread Paul Saab (JIRA)

Set memcache flush size per table
-

 Key: HADOOP-2356
 URL: https://issues.apache.org/jira/browse/HADOOP-2356
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Reporter: Paul Saab
Priority: Minor


The amount of memory taken by the memcache before a flush is currently a global 
parameter.  It should be configurable per-table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2357) [hbase] Compaction cleanup; less deleting + prevent possible file leaks

[hbase] Compaction cleanup; less deleting + prevent possible file leaks
---

 Key: HADOOP-2357
 URL: https://issues.apache.org/jira/browse/HADOOP-2357
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: stack
Priority: Minor
 Fix For: 0.16.0


This issue is being created so I can commit the compaction patch that just 
passed hudson over in HADOOP-2283.  That issue is about trouble accessing hdfs. 
  It should stay open since haven't yet figured whats up.  As a by-product of 
the investigation, the compaction patch was generated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2338) [hbase] NPE in master server


[ 
https://issues.apache.org/jira/browse/HADOOP-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548737
 ] 

Jim Kellerman commented on HADOOP-2338:
---

 If regionserver fails to record split in META, there is no other means of 
 master finding daughter regions.
 What happens on restart? We pick up the parent again? The daughter regions 
 will be by-passed?

Yes, there is a real potential for corrupting hbase in this case. Hence the 
quiesce shut down mechanism proposed above

 [hbase] NPE in master server
 

 Key: HADOOP-2338
 URL: https://issues.apache.org/jira/browse/HADOOP-2338
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: master.log.gz, patch.txt


 Master gets an NPE after receiving multiple responses from the same server 
 telling the master it has opened a region.
 {code}
 2007-12-02 20:31:37,515 DEBUG hbase.HRegion - Next sequence id for region 
 postlog,img254/577/02suecia024richardburnson0.jpg,1196619667879 is 73377537
 2007-12-02 20:31:37,517 INFO  hbase.HRegion - region 
 postlog,img254/577/02suecia024richardburnson0.jpg,1196619667879 available
 2007-12-02 20:31:39,200 WARN  hbase.HRegionServer - Processing message 
 (Retry: 0)
 java.io.IOException: java.io.IOException: java.lang.NullPointerException
 at org.apache.hadoop.hbase.HMaster.processMsgs(HMaster.java :1484)
 at org.apache.hadoop.hbase.HMaster.regionServerReport(HMaster.java:1423)
 at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java
  :25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0 (Native Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java
  :27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
 at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException 
 (RemoteExceptionHandler.java:48)
 at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:759)
 at java.lang.Thread.run(Thread.java:619)
   case HMsg.MSG_REPORT_PROCESS_OPEN:
 synchronized ( this.assignAttempts) {
   // Region server has acknowledged request to open region.
   // Extend region open time by 1/2 max region open time.
 **1484**  assignAttempts.put(region.getRegionName (), 
   Long.valueOf(assignAttempts.get(
   region.getRegionName()).longValue() +
   (this.maxRegionOpenTime / 2)));
 }
 break;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2338) [hbase] NPE in master server


[ 
https://issues.apache.org/jira/browse/HADOOP-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548732
 ] 

stack commented on HADOOP-2338:
---

If regionserver fails to record split in META, there is no other means of 
master finding daughter regions.  What happens on restart?  We pick up the 
parent again?  The daughter regions will be by-passed?

 [hbase] NPE in master server
 

 Key: HADOOP-2338
 URL: https://issues.apache.org/jira/browse/HADOOP-2338
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: master.log.gz, patch.txt


 Master gets an NPE after receiving multiple responses from the same server 
 telling the master it has opened a region.
 {code}
 2007-12-02 20:31:37,515 DEBUG hbase.HRegion - Next sequence id for region 
 postlog,img254/577/02suecia024richardburnson0.jpg,1196619667879 is 73377537
 2007-12-02 20:31:37,517 INFO  hbase.HRegion - region 
 postlog,img254/577/02suecia024richardburnson0.jpg,1196619667879 available
 2007-12-02 20:31:39,200 WARN  hbase.HRegionServer - Processing message 
 (Retry: 0)
 java.io.IOException: java.io.IOException: java.lang.NullPointerException
 at org.apache.hadoop.hbase.HMaster.processMsgs(HMaster.java :1484)
 at org.apache.hadoop.hbase.HMaster.regionServerReport(HMaster.java:1423)
 at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java
  :25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0 (Native Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java
  :27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
 at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException 
 (RemoteExceptionHandler.java:48)
 at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:759)
 at java.lang.Thread.run(Thread.java:619)
   case HMsg.MSG_REPORT_PROCESS_OPEN:
 synchronized ( this.assignAttempts) {
   // Region server has acknowledged request to open region.
   // Extend region open time by 1/2 max region open time.
 **1484**  assignAttempts.put(region.getRegionName (), 
   Long.valueOf(assignAttempts.get(
   region.getRegionName()).longValue() +
   (this.maxRegionOpenTime / 2)));
 }
 break;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2348) [hbase] lock_id in HTable.startUpdate and commit/abort is misleading and useless


 [ 
https://issues.apache.org/jira/browse/HADOOP-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Duxbury updated HADOOP-2348:
--

Component/s: contrib/hbase

 [hbase] lock_id in HTable.startUpdate and commit/abort is misleading and 
 useless
 

 Key: HADOOP-2348
 URL: https://issues.apache.org/jira/browse/HADOOP-2348
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: Bryan Duxbury
Assignee: Jim Kellerman
Priority: Minor

 In the past, the lock id returned by HTable.startUpdate was a real lock id 
 from a remote server. However, that has been superceeded by the BatchUpdate 
 process, so now the lock id is just an arbitrary value. More, it doesn't 
 actually add any value, because while it implies that you could start two 
 updates on the same HTable and commit them separately, this is in fact not 
 the case. Any attempt to do a second startUpdate throws an 
 IllegalStateException. 
 Since there is no added functionality afforded by the presence of this 
 parameter, I suggest that we overload all methods that use it to ignore it 
 and print a deprecation notice. startUpdate can just return a constant like 1 
 and eventually turn into a boolean or some other useful value.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2338) [hbase] NPE in master server


[ 
https://issues.apache.org/jira/browse/HADOOP-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548736
 ] 

Jim Kellerman commented on HADOOP-2338:
---

Here's a rough outline of how cluster shutdown should work.:
- master receives shutdown request
- as each region server reports in, the master instructs the region server to 
'quiesce'. This means that the region server should stop accepting requests for 
user regions, and close them. If it has no meta regions, it reports back to the 
master that it is exiting. Otherwise it reports that it is quiesced.
- once there are only quiesced region servers running, the master instructs 
them to shut down. They close the meta regions and tell the master that they 
have exited.
- when there are no more active region servers, the master can then shut down.

 [hbase] NPE in master server
 

 Key: HADOOP-2338
 URL: https://issues.apache.org/jira/browse/HADOOP-2338
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman
 Fix For: 0.16.0

 Attachments: master.log.gz, patch.txt


 Master gets an NPE after receiving multiple responses from the same server 
 telling the master it has opened a region.
 {code}
 2007-12-02 20:31:37,515 DEBUG hbase.HRegion - Next sequence id for region 
 postlog,img254/577/02suecia024richardburnson0.jpg,1196619667879 is 73377537
 2007-12-02 20:31:37,517 INFO  hbase.HRegion - region 
 postlog,img254/577/02suecia024richardburnson0.jpg,1196619667879 available
 2007-12-02 20:31:39,200 WARN  hbase.HRegionServer - Processing message 
 (Retry: 0)
 java.io.IOException: java.io.IOException: java.lang.NullPointerException
 at org.apache.hadoop.hbase.HMaster.processMsgs(HMaster.java :1484)
 at org.apache.hadoop.hbase.HMaster.regionServerReport(HMaster.java:1423)
 at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java
  :25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0 (Native Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java
  :27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
 at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException 
 (RemoteExceptionHandler.java:48)
 at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:759)
 at java.lang.Thread.run(Thread.java:619)
   case HMsg.MSG_REPORT_PROCESS_OPEN:
 synchronized ( this.assignAttempts) {
   // Region server has acknowledged request to open region.
   // Extend region open time by 1/2 max region open time.
 **1484**  assignAttempts.put(region.getRegionName (), 
   Long.valueOf(assignAttempts.get(
   region.getRegionName()).longValue() +
   (this.maxRegionOpenTime / 2)));
 }
 break;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2342) create a micro-benchmark for measure local-file versus hdfs read


 [ 
https://issues.apache.org/jira/browse/HADOOP-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-2342:
--

Attachment: throughput.patch

This benchmark reads and writes files using java.io, RawLocalFileSystem, 
LocalFileSystem, and HDFS and reports the time.

 create a micro-benchmark for measure local-file versus hdfs read
 

 Key: HADOOP-2342
 URL: https://issues.apache.org/jira/browse/HADOOP-2342
 Project: Hadoop
  Issue Type: Test
  Components: dfs
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.16.0

 Attachments: throughput.patch


 We should have a benchmark that measures reading a 10g file from hdfs and 
 from local disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1841) IPC server should write repsonses asynchronously


 [ 
https://issues.apache.org/jira/browse/HADOOP-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HADOOP-1841:
-

Status: Open  (was: Patch Available)

Findbugs warnings.

 IPC server should write repsonses asynchronously
 

 Key: HADOOP-1841
 URL: https://issues.apache.org/jira/browse/HADOOP-1841
 Project: Hadoop
  Issue Type: Improvement
  Components: ipc
Reporter: Doug Cutting
Assignee: dhruba borthakur
 Fix For: 0.16.0

 Attachments: asyncRPC-2.patch, asyncRPC-4.patch, asyncRPC-5.patch, 
 asyncRPC-6.patch, asyncRPC.patch, asyncRPC.patch


 Hadoop's IPC Server currently writes responses from request handler threads 
 using blocking writes.  Performance and scalability might be improved if 
 responses were written asynchronously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2359) PendingReplicationMonitor thread received exception. java.lang.InterruptedException

PendingReplicationMonitor thread received exception. 
java.lang.InterruptedException
---

 Key: HADOOP-2359
 URL: https://issues.apache.org/jira/browse/HADOOP-2359
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.16.0
Reporter: Owen O'Malley
Assignee: dhruba borthakur
 Fix For: 0.16.0


I sometimes get the message:

07/12/05 19:01:36 WARN fs.FSNamesystem: PendingReplicationMonitor thread 
received exception. java.lang.InterruptedException: sleep interrupted

from mini-dfs cluster.

InterruptedExceptions should be handled quietly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2012) Periodic verification at the Datanode

[
https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548765
]

Raghu Angadi commented on HADOOP-2012:
--

For Windows, will make work most of the time.. very rarely some updates to last
verification might fail and thats ok. I will see how much this will take.

Another option of course is to fix properly and make it work equally work every
where.

Periodic verification at the Datanode
-

Key: HADOOP-2012
URL: https://issues.apache.org/jira/browse/HADOOP-2012
Project: Hadoop
Issue Type: New Feature
Components: dfs
Reporter: Raghu Angadi
Assignee: Raghu Angadi
Fix For: 0.16.0

Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch,
HADOOP-2012.patch

Currently on-disk data corruption on data blocks is detected only when it is
read by the client or by another datanode. These errors are detected much
earlier if datanode can periodically verify the data checksums for the local
blocks.
Some of the issues to consider :
- How should we check the blocks ( no more often than once every couple of
weeks ?)
- How do we keep track of when a block was last verfied ( there is a .meta
file associcated with each lock ).
- What action to take once a corruption is detected
- Scanning should be done as a very low priority with rest of the datanode
disk traffic in mind.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2356) Set memcache flush size per table


[ 
https://issues.apache.org/jira/browse/HADOOP-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548731
 ] 

Jim Kellerman commented on HADOOP-2356:
---

Actually, since there is one memcache per column, this could be set on a per 
column basis

 Set memcache flush size per table
 -

 Key: HADOOP-2356
 URL: https://issues.apache.org/jira/browse/HADOOP-2356
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Reporter: Paul Saab
Priority: Minor

 The amount of memory taken by the memcache before a flush is currently a 
 global parameter.  It should be configurable per-table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2342) create a micro-benchmark for measure local-file versus hdfs read


 [ 
https://issues.apache.org/jira/browse/HADOOP-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-2342:
--

Status: Patch Available  (was: Open)

 create a micro-benchmark for measure local-file versus hdfs read
 

 Key: HADOOP-2342
 URL: https://issues.apache.org/jira/browse/HADOOP-2342
 Project: Hadoop
  Issue Type: Test
  Components: dfs
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.16.0

 Attachments: throughput.patch


 We should have a benchmark that measures reading a 10g file from hdfs and 
 from local disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1841) IPC server should write repsonses asynchronously


 [ 
https://issues.apache.org/jira/browse/HADOOP-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HADOOP-1841:
-

Attachment: asyncRPC-6.patch

Fixed findbugs warnings.

 IPC server should write repsonses asynchronously
 

 Key: HADOOP-1841
 URL: https://issues.apache.org/jira/browse/HADOOP-1841
 Project: Hadoop
  Issue Type: Improvement
  Components: ipc
Reporter: Doug Cutting
Assignee: dhruba borthakur
 Fix For: 0.16.0

 Attachments: asyncRPC-2.patch, asyncRPC-4.patch, asyncRPC-5.patch, 
 asyncRPC-6.patch, asyncRPC.patch, asyncRPC.patch


 Hadoop's IPC Server currently writes responses from request handler threads 
 using blocking writes.  Performance and scalability might be improved if 
 responses were written asynchronously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2012) Periodic verification at the Datanode

2007-12-05 Thread eric baldeschwieler (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548756
]

eric baldeschwieler commented on HADOOP-2012:
-

I'd really rather see us get this right for supported platforms rather than
declare an issue done when we know it does not work on some platforms.

This is particularly vexing when design choices could clearly be made that
would avoid these issues.

-1

Periodic verification at the Datanode
-

Key: HADOOP-2012
URL: https://issues.apache.org/jira/browse/HADOOP-2012
Project: Hadoop
Issue Type: New Feature
Components: dfs
Reporter: Raghu Angadi
Assignee: Raghu Angadi
Fix For: 0.16.0

Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch,
HADOOP-2012.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4) tool to mount dfs on linux


[ 
https://issues.apache.org/jira/browse/HADOOP-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548789
 ] 

Anurag Sharma commented on HADOOP-4:


hi Doug,

Thanks for pointing out this issue.  I will remove the FUSE-J patch and try one 
of the other routes you suggested (to have a patched FUSE-J available), and 
will come back with a resolution on this very soon.

-anurag

 tool to mount dfs on linux
 --

 Key: HADOOP-4
 URL: https://issues.apache.org/jira/browse/HADOOP-4
 Project: Hadoop
  Issue Type: Improvement
  Components: fs
Affects Versions: 0.5.0
 Environment: linux only
Reporter: John Xing
Assignee: Doug Cutting
 Attachments: fuse-hadoop-0.1.0_fuse-j.2.2.3_hadoop.0.5.0.tar.gz, 
 fuse-hadoop-0.1.0_fuse-j.2.4_hadoop.0.5.0.tar.gz, fuse-hadoop-0.1.1.tar.gz, 
 fuse-j-hadoopfs-0.1.zip, fuse-j-patch.zip


 tool to mount dfs on linux

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2355) Set region split size on table creation


[ 
https://issues.apache.org/jira/browse/HADOOP-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548733
 ] 

Jim Kellerman commented on HADOOP-2355:
---

The finest level of granularity for this parameter would be at the table level 
since a region split affects all the columns in a particular row range

 Set region split size on table creation
 ---

 Key: HADOOP-2355
 URL: https://issues.apache.org/jira/browse/HADOOP-2355
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Reporter: Paul Saab
Priority: Minor

 Right now the region size before a split is determined by a global 
 configuration.  It would be nice to configure tables independently of the 
 global parameter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1652) Rebalance data blocks when new data nodes added or data nodes become full

[
https://issues.apache.org/jira/browse/HADOOP-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

dhruba borthakur updated HADOOP-1652:
-

Resolution: Fixed
Status: Resolved (was: Patch Available)

I just committed this. Thanks Hairong!

Rebalance data blocks when new data nodes added or data nodes become full
-

Key: HADOOP-1652
URL: https://issues.apache.org/jira/browse/HADOOP-1652
Project: Hadoop
Issue Type: New Feature
Components: dfs
Affects Versions: 0.13.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
Fix For: 0.16.0

Attachments: balancer.patch, balancer1.patch, balancer2.patch,
balancer3.patch, balancer4.patch, balancer5.patch, balancer6.patch,
balancer7.patch, balancer8.patch, BalancerAdminGuide.pdf,
BalancerAdminGuide1.pdf, BalancerUserGuide2.pdf, RebalanceDesign4.pdf,
RebalanceDesign5.pdf, RebalanceDesign6.pdf

When a new data node joins hdfs cluster, it does not hold much data. So any
map task assigned to the machine most likely does not read local data, thus
increasing the use of network bandwidth. On the other hand, when some data
nodes become full, new data blocks are placed on only non-full data nodes,
thus reducing their read parallelism.
This jira aims to find an approach to redistribute data blocks when imbalance
occurs in the cluster. An solution should meet the following requirements:
1. It maintains data availablility guranteens in the sense that rebalancing
does not reduce the number of replicas that a block has or the number of
racks that the block resides.
2. An adminstrator should be able to invoke and interrupt rebalancing from a
command line.
3. Rebalancing should be throttled so that rebalancing does not cause a
namenode to be too busy to serve any incoming request or saturate the network.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2160) separate website from user documentation

2007-12-05 Thread Doug Cutting (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Cutting updated HADOOP-2160:
-

Attachment: trunk.patch

One can now check out 
http://svn.apache.org/repos/asf/lucene/hadoop/site/publish to preview the new 
top-level site.

This is a patch for trunk, removing all of the project-level documentation, so 
that all that remains is end-user documentation, suitable for distribution with 
a release, or for linking to from the site.  I've also added a docs target to 
the top-level build.xml that runs forrest.

Unless there are objections, I'll commit this soon and update the website.

 separate website from user documentation
 

 Key: HADOOP-2160
 URL: https://issues.apache.org/jira/browse/HADOOP-2160
 Project: Hadoop
  Issue Type: Improvement
Reporter: Doug Cutting
Assignee: Doug Cutting
 Attachments: trunk.patch


 Currently the website only contains the documentation for a single release, 
 the current release.  It would be better if the website also contained 
 documentation for past releases, since not everyone is using the current 
 release.  To implement this we should move the top-level of the website, 
 including project and developer information, from the subversion trunk into a 
 separate tree, so that only the user documentation is branched per release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4) tool to mount dfs on linux

2007-12-05 Thread Doug Cutting (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548807
 ] 

Doug Cutting commented on HADOOP-4:
---

 I went through the license for Fuse-J and it is distributed under LGPL,

Unfortunately, the ASF cannot host things published under LGPL either.  Sorry!

 tool to mount dfs on linux
 --

 Key: HADOOP-4
 URL: https://issues.apache.org/jira/browse/HADOOP-4
 Project: Hadoop
  Issue Type: Improvement
  Components: fs
Affects Versions: 0.5.0
 Environment: linux only
Reporter: John Xing
Assignee: Doug Cutting
 Attachments: fuse-hadoop-0.1.0_fuse-j.2.2.3_hadoop.0.5.0.tar.gz, 
 fuse-hadoop-0.1.0_fuse-j.2.4_hadoop.0.5.0.tar.gz, fuse-hadoop-0.1.1.tar.gz, 
 fuse-j-hadoopfs-0.1.zip, fuse-j-patch.zip


 tool to mount dfs on linux

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4) tool to mount dfs on linux


[ 
https://issues.apache.org/jira/browse/HADOOP-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548806
 ] 

Anurag Sharma commented on HADOOP-4:


Hi Doug,

I went through the license for Fuse-J and it is distributed under LGPL, do you 
think that would allow the Fuse-J patches to be hosted on Apache?

(In the latter case we would still modify the submission above to be a contrib 
module that downloads Fuse-J, applies our patch, and builds it, except we won't 
have to find a place to host the patch).

-thanks
-anurag


 tool to mount dfs on linux
 --

 Key: HADOOP-4
 URL: https://issues.apache.org/jira/browse/HADOOP-4
 Project: Hadoop
  Issue Type: Improvement
  Components: fs
Affects Versions: 0.5.0
 Environment: linux only
Reporter: John Xing
Assignee: Doug Cutting
 Attachments: fuse-hadoop-0.1.0_fuse-j.2.2.3_hadoop.0.5.0.tar.gz, 
 fuse-hadoop-0.1.0_fuse-j.2.4_hadoop.0.5.0.tar.gz, fuse-hadoop-0.1.1.tar.gz, 
 fuse-j-hadoopfs-0.1.zip, fuse-j-patch.zip


 tool to mount dfs on linux

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2359) PendingReplicationMonitor thread received exception. java.lang.InterruptedException


 [ 
https://issues.apache.org/jira/browse/HADOOP-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HADOOP-2359:
-

Attachment: replicationWarning.patch

Changed warning message to debug.

 PendingReplicationMonitor thread received exception. 
 java.lang.InterruptedException
 ---

 Key: HADOOP-2359
 URL: https://issues.apache.org/jira/browse/HADOOP-2359
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.16.0
Reporter: Owen O'Malley
Assignee: dhruba borthakur
 Fix For: 0.16.0

 Attachments: replicationWarning.patch


 I sometimes get the message:
 07/12/05 19:01:36 WARN fs.FSNamesystem: PendingReplicationMonitor thread 
 received exception. java.lang.InterruptedException: sleep interrupted
 from mini-dfs cluster.
 InterruptedExceptions should be handled quietly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2359) PendingReplicationMonitor thread received exception. java.lang.InterruptedException


 [ 
https://issues.apache.org/jira/browse/HADOOP-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HADOOP-2359:
-

Status: Patch Available  (was: Open)

 PendingReplicationMonitor thread received exception. 
 java.lang.InterruptedException
 ---

 Key: HADOOP-2359
 URL: https://issues.apache.org/jira/browse/HADOOP-2359
 Project: Hadoop
  Issue Type: Bug
  Components: dfs
Affects Versions: 0.16.0
Reporter: Owen O'Malley
Assignee: dhruba borthakur
 Fix For: 0.16.0

 Attachments: replicationWarning.patch


 I sometimes get the message:
 07/12/05 19:01:36 WARN fs.FSNamesystem: PendingReplicationMonitor thread 
 received exception. java.lang.InterruptedException: sleep interrupted
 from mini-dfs cluster.
 InterruptedExceptions should be handled quietly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-496) Expose HDFS as a WebDAV store

[
https://issues.apache.org/jira/browse/HADOOP-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Anurag Sharma updated HADOOP-496:
-

Attachment: (was: fuse-j-hadoopfs-0.zip)

Expose HDFS as a WebDAV store
-

Key: HADOOP-496
URL: https://issues.apache.org/jira/browse/HADOOP-496
Project: Hadoop
Issue Type: New Feature
Components: dfs
Reporter: Michel Tourn
Assignee: Enis Soztutar
Attachments: hadoop-496-3.patch, hadoop-496-4.patch,
hadoop-496-spool-cleanup.patch, hadoop-webdav.zip, jetty-slide.xml,
lib.webdav.tar.gz, screenshot-1.jpg, slideusers.properties,
webdav_wip1.patch, webdav_wip2.patch

WebDAV stands for Distributed Authoring and Versioning. It is a set of
extensions to the HTTP protocol that lets users collaboratively edit and
manage files on a remote web server. It is often considered as a replacement
for NFS or SAMBA
HDFS (Hadoop Distributed File System) needs a friendly file system interface.
DFSShell commands are unfamiliar. Instead it is more convenient for Hadoop
users to use a mountable network drive. A friendly interface to HDFS will be
used both for casual browsing of data and for bulk import/export.
The FUSE provider for HDFS is already available (
http://issues.apache.org/jira/browse/HADOOP-17 ) but it had scalability
problems. WebDAV is a popular alternative.
The typical licensing terms for WebDAV tools are also attractive:
GPL for Linux client tools that Hadoop would not redistribute anyway.
More importantly, Apache Project/Apache license for Java tools and for server
components.
This allows for a tighter integration with the HDFS code base.
There are some interesting Apache projects that support WebDAV.
But these are probably too heavyweight for the needs of Hadoop:
Tomcat servlet:
http://tomcat.apache.org/tomcat-4.1-doc/catalina/docs/api/org/apache/catalina/servlets/WebdavServlet.html
Slide: http://jakarta.apache.org/slide/
Being HTTP-based and backwards-compatible with Web Browser clients, the
WebDAV server protocol could even be piggy-backed on the existing Web UI
ports of the Hadoop name node / data nodes. WebDAV can be hosted as (Jetty)
servlets. This minimizes server code bloat and this avoids additional network
traffic between HDFS and the WebDAV server.
General Clients (read-only):
Any web browser
Linux Clients:
Mountable GPL davfs2 http://dav.sourceforge.net/
FTP-like GPL Cadaver http://www.webdav.org/cadaver/
Server Protocol compliance tests:
http://www.webdav.org/neon/litmus/
A goal is for Hadoop HDFS to pass this test (minus support for Properties)
Pure Java clients:
DAV Explorer Apache lic. http://www.ics.uci.edu/~webdav/
WebDAV also makes it convenient to add advanced features in an incremental
fashion:
file locking, access control lists, hard links, symbolic links.
New WebDAV standards get accepted and more or less featured WebDAV clients
exist.
core http://www.webdav.org/specs/rfc2518.html
ACLs http://www.webdav.org/specs/rfc3744.html
redirects soft links http://greenbytes.de/tech/webdav/rfc4437.html
BIND hard links http://www.webdav.org/bind/
quota http://tools.ietf.org/html/rfc4331

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2160) separate website from user documentation

2007-12-05 Thread Doug Cutting (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548734
 ] 

Doug Cutting commented on HADOOP-2160:
--

FYI, I will make some commits on this issue without first submitting patches, 
since it involves a lot of subversion commands that are not amenable to patches.

 separate website from user documentation
 

 Key: HADOOP-2160
 URL: https://issues.apache.org/jira/browse/HADOOP-2160
 Project: Hadoop
  Issue Type: Improvement
Reporter: Doug Cutting
Assignee: Doug Cutting

 Currently the website only contains the documentation for a single release, 
 the current release.  It would be better if the website also contained 
 documentation for past releases, since not everyone is using the current 
 release.  To implement this we should move the top-level of the website, 
 including project and developer information, from the subversion trunk into a 
 separate tree, so that only the user documentation is branched per release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2012) Periodic verification at the Datanode

2007-12-05 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548797
 ] 

Konstantin Shvachko commented on HADOOP-2012:
-

The question here is whether we would go with our current decision if we new it 
will not be supported on windows?
If we let the balancer write a log type data (verified block #s) into a special 
file balancer.log instead of modifying
meta-data files, will that be a problem? Looks like Eric already had a proposal 
of scanning blocks in a predetermined
order. Should we reconsider this?

 Periodic verification at the Datanode
 -

 Key: HADOOP-2012
 URL: https://issues.apache.org/jira/browse/HADOOP-2012
 Project: Hadoop
  Issue Type: New Feature
  Components: dfs
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Fix For: 0.16.0

 Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch, 
 HADOOP-2012.patch


 Currently on-disk data corruption on data blocks is detected only when it is 
 read by the client or by another datanode.  These errors are detected much 
 earlier if datanode can periodically verify the data checksums for the local 
 blocks.
 Some of the issues to consider :
 - How should we check the blocks ( no more often than once every couple of 
 weeks ?)
 - How do we keep track of when a block was last verfied ( there is a .meta 
 file associcated with each lock ).
 - What action to take once a corruption is detected
 - Scanning should be done as a very low priority with rest of the datanode 
 disk traffic in mind.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2360) hadoop::RecordReader::read() throws exception in HadoopPipes::RecordWriter

2007-12-05 Thread Yiping Han (JIRA)

hadoop::RecordReader::read() throws exception in HadoopPipes::RecordWriter
--

 Key: HADOOP-2360
 URL: https://issues.apache.org/jira/browse/HADOOP-2360
 Project: Hadoop
  Issue Type: Bug
Affects Versions: 0.14.3
Reporter: Yiping Han
Priority: Blocker


The jute record is in format:

  class SampleValue 
  {
   ustring data;
  }

And in HadoopPipes::RecordWriter::emit(), has code like this:

void SampleRecordWriterC::emit(const std::string key, const std::string value)
{
if (key.empty() || value.empty()) {
return;
}

hadoop::StringInStream key_in_stream(const_caststd::string(key));
hadoop::RecordReader key_record_reader(key_in_stream, hadoop::kCSV);
EmitKeyT emit_key;
key_record_reader.read(emit_key);

hadoop::StringInStream value_in_stream(const_caststd::string(value));
hadoop::RecordReader value_record_reader(value_in_stream, hadoop::kCSV);
EmitValueT emit_value;

value_record_reader.read(emit_value);

return;
}

And the code throw hadoop::IOException at the read() line.


In the mapper, I have faked record emitted by the following code:

std::string value;
EmitValueT emit_value;

emit_value.getData().assign(FakeData);

hadoop::StringOutStream value_out_stream(value);
hadoop::RecordWriter value_record_writer(value_out_stream, hadoop::kCSV);
value_record_writer.write(emit_value);

We haven't update to the up-to-date version of hadoop. But I've searched the 
tickets and didn't find one issuing this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1298) adding user info to file

2007-12-05 Thread Tsz Wo (Nicholas), SZE (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HADOOP-1298:
---

Attachment: (was: 20071116b.patch)

 adding user info to file
 

 Key: HADOOP-1298
 URL: https://issues.apache.org/jira/browse/HADOOP-1298
 Project: Hadoop
  Issue Type: New Feature
  Components: dfs, fs
Reporter: Kurtis Heimerl
Assignee: Christophe Taton
 Attachments: 1298_2007-09-22_1.patch, 1298_2007-10-04_1.patch, 
 1298_20071205.patch, hadoop-user-munncha.patch17


 I'm working on adding a permissions model to hadoop's DFS. The first step is 
 this change, which associates user info with files. Following this I'll 
 assoicate permissions info, then block methods based on that user info, then 
 authorization of the user info. 
 So, right now i've implemented adding user info to files. I'm looking for 
 feedback before I clean this up and make it offical. 
 I wasn't sure what release, i'm working off trunk. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1841) IPC server should write repsonses asynchronously

[
https://issues.apache.org/jira/browse/HADOOP-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548689
]

Hadoop QA commented on HADOOP-1841:
---

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12371003/asyncRPC-5.patch
against trunk revision r601232.

@author +1. The patch does not contain any @author tags.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new compiler warnings.

findbugs -1. The patch appears to introduce 2 new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests -1. The patch failed contrib unit tests.

Test results:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1272/testReport/
Findbugs warnings:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1272/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1272/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1272/console

This message is automatically generated.

IPC server should write repsonses asynchronously

Key: HADOOP-1841
URL: https://issues.apache.org/jira/browse/HADOOP-1841
Project: Hadoop
Issue Type: Improvement
Components: ipc
Reporter: Doug Cutting
Assignee: dhruba borthakur
Fix For: 0.16.0

Attachments: asyncRPC-2.patch, asyncRPC-4.patch, asyncRPC-5.patch,
asyncRPC.patch, asyncRPC.patch

Hadoop's IPC Server currently writes responses from request handler threads
using blocking writes. Performance and scalability might be improved if
responses were written asynchronously.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2012) Periodic verification at the Datanode


[ 
https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548824
 ] 

Raghu Angadi commented on HADOOP-2012:
--

 The question here is whether we would go with our current decision if we new 
 it will not be supported on windows?
A related concern is whether this is required for appends anyway.

 Periodic verification at the Datanode
 -

 Key: HADOOP-2012
 URL: https://issues.apache.org/jira/browse/HADOOP-2012
 Project: Hadoop
  Issue Type: New Feature
  Components: dfs
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Fix For: 0.16.0

 Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch, 
 HADOOP-2012.patch


 Currently on-disk data corruption on data blocks is detected only when it is 
 read by the client or by another datanode.  These errors are detected much 
 earlier if datanode can periodically verify the data checksums for the local 
 blocks.
 Some of the issues to consider :
 - How should we check the blocks ( no more often than once every couple of 
 weeks ?)
 - How do we keep track of when a block was last verfied ( there is a .meta 
 file associcated with each lock ).
 - What action to take once a corruption is detected
 - Scanning should be done as a very low priority with rest of the datanode 
 disk traffic in mind.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1841) IPC server should write repsonses asynchronously

[
https://issues.apache.org/jira/browse/HADOOP-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548799
]

Hadoop QA commented on HADOOP-1841:
---

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12371062/asyncRPC-6.patch
against trunk revision r601383.

@author +1. The patch does not contain any @author tags.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new compiler warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1273/testReport/
Findbugs warnings:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1273/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1273/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1273/console

This message is automatically generated.

IPC server should write repsonses asynchronously

Key: HADOOP-1841
URL: https://issues.apache.org/jira/browse/HADOOP-1841
Project: Hadoop
Issue Type: Improvement
Components: ipc
Reporter: Doug Cutting
Assignee: dhruba borthakur
Fix For: 0.16.0

Attachments: asyncRPC-2.patch, asyncRPC-4.patch, asyncRPC-5.patch,
asyncRPC-6.patch, asyncRPC.patch, asyncRPC.patch

Hadoop's IPC Server currently writes responses from request handler threads
using blocking writes. Performance and scalability might be improved if
responses were written asynchronously.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1652) Rebalance data blocks when new data nodes added or data nodes become full

2007-12-05 Thread Hairong Kuang (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hairong Kuang updated HADOOP-1652:
--

Attachment: balancer8.patch

The patch has a minor change to make the junit test to run faster.

Rebalance data blocks when new data nodes added or data nodes become full
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HADOOP-2357) [hbase] Compaction cleanup; less deleting + prevent possible file leaks


 [ 
https://issues.apache.org/jira/browse/HADOOP-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HADOOP-2357.
---

Resolution: Fixed

Resolving.  Committed the compaction.patch from over in HADOOP-2283.

 [hbase] Compaction cleanup; less deleting + prevent possible file leaks
 ---

 Key: HADOOP-2357
 URL: https://issues.apache.org/jira/browse/HADOOP-2357
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: stack
Priority: Minor
 Fix For: 0.16.0


 This issue is being created so I can commit the compaction patch that just 
 passed hudson over in HADOOP-2283.  That issue is about trouble accessing 
 hdfs.   It should stay open since haven't yet figured whats up.  As a 
 by-product of the investigation, the compaction patch was generated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-1327) Doc on Streaming


 [ 
https://issues.apache.org/jira/browse/HADOOP-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HADOOP-1327:
-

Assignee: Rob Weltman

 Doc on Streaming
 

 Key: HADOOP-1327
 URL: https://issues.apache.org/jira/browse/HADOOP-1327
 Project: Hadoop
  Issue Type: Improvement
  Components: documentation
Reporter: Runping Qi
Assignee: Rob Weltman
 Attachments: HADOOP-1327.patch, site.xml, streaming.html, 
 streaming.xml




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2012) Periodic verification at the Datanode


[ 
https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548828
 ] 

dhruba borthakur commented on HADOOP-2012:
--

For appends, a reader can be reading the datafile and metafile while the writer 
is still writing to them. This is supported on Linux as well as windows.

 Periodic verification at the Datanode
 -

 Key: HADOOP-2012
 URL: https://issues.apache.org/jira/browse/HADOOP-2012
 Project: Hadoop
  Issue Type: New Feature
  Components: dfs
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Fix For: 0.16.0

 Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch, 
 HADOOP-2012.patch


 Currently on-disk data corruption on data blocks is detected only when it is 
 read by the client or by another datanode.  These errors are detected much 
 earlier if datanode can periodically verify the data checksums for the local 
 blocks.
 Some of the issues to consider :
 - How should we check the blocks ( no more often than once every couple of 
 weeks ?)
 - How do we keep track of when a block was last verfied ( there is a .meta 
 file associcated with each lock ).
 - What action to take once a corruption is detected
 - Scanning should be done as a very low priority with rest of the datanode 
 disk traffic in mind.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2342) create a micro-benchmark for measure local-file versus hdfs read

[
https://issues.apache.org/jira/browse/HADOOP-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548820
]

Hadoop QA commented on HADOOP-2342:
---

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12371063/throughput.patch
against trunk revision r601491.

@author +1. The patch does not contain any @author tags.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new compiler warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1274/testReport/
Findbugs warnings:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1274/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1274/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1274/console

This message is automatically generated.

create a micro-benchmark for measure local-file versus hdfs read

Key: HADOOP-2342
URL: https://issues.apache.org/jira/browse/HADOOP-2342
Project: Hadoop
Issue Type: Test
Components: dfs
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Fix For: 0.16.0

Attachments: throughput.patch

We should have a benchmark that measures reading a 10g file from hdfs and
from local disk.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2012) Periodic verification at the Datanode


[ 
https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548832
 ] 

Raghu Angadi commented on HADOOP-2012:
--

 This is supported on Linux as well as windows.
Can we use that code here? I guessing it handles upgraded directories also...


 Periodic verification at the Datanode
 -

 Key: HADOOP-2012
 URL: https://issues.apache.org/jira/browse/HADOOP-2012
 Project: Hadoop
  Issue Type: New Feature
  Components: dfs
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Fix For: 0.16.0

 Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch, 
 HADOOP-2012.patch


 Currently on-disk data corruption on data blocks is detected only when it is 
 read by the client or by another datanode.  These errors are detected much 
 earlier if datanode can periodically verify the data checksums for the local 
 blocks.
 Some of the issues to consider :
 - How should we check the blocks ( no more often than once every couple of 
 weeks ?)
 - How do we keep track of when a block was last verfied ( there is a .meta 
 file associcated with each lock ).
 - What action to take once a corruption is detected
 - Scanning should be done as a very low priority with rest of the datanode 
 disk traffic in mind.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2342) create a micro-benchmark for measure local-file versus hdfs read


 [ 
https://issues.apache.org/jira/browse/HADOOP-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-2342:
--

Status: Open  (was: Patch Available)

 create a micro-benchmark for measure local-file versus hdfs read
 

 Key: HADOOP-2342
 URL: https://issues.apache.org/jira/browse/HADOOP-2342
 Project: Hadoop
  Issue Type: Test
  Components: dfs
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.16.0

 Attachments: throughput.patch


 We should have a benchmark that measures reading a 10g file from hdfs and 
 from local disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2342) create a micro-benchmark for measure local-file versus hdfs read


 [ 
https://issues.apache.org/jira/browse/HADOOP-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-2342:
--

Status: Patch Available  (was: Open)

Need to be re-reviewed by QA.

 create a micro-benchmark for measure local-file versus hdfs read
 

 Key: HADOOP-2342
 URL: https://issues.apache.org/jira/browse/HADOOP-2342
 Project: Hadoop
  Issue Type: Test
  Components: dfs
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.16.0

 Attachments: throughput.patch


 We should have a benchmark that measures reading a 10g file from hdfs and 
 from local disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2342) create a micro-benchmark for measure local-file versus hdfs read


 [ 
https://issues.apache.org/jira/browse/HADOOP-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-2342:
--

Attachment: (was: throughput.patch)

 create a micro-benchmark for measure local-file versus hdfs read
 

 Key: HADOOP-2342
 URL: https://issues.apache.org/jira/browse/HADOOP-2342
 Project: Hadoop
  Issue Type: Test
  Components: dfs
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.16.0

 Attachments: throughput.patch


 We should have a benchmark that measures reading a 10g file from hdfs and 
 from local disk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2359) PendingReplicationMonitor thread received exception. java.lang.InterruptedException

[
https://issues.apache.org/jira/browse/HADOOP-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548847
]

Hadoop QA commented on HADOOP-2359:
---

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12371067/replicationWarning.patch
against trunk revision r601518.

@author +1. The patch does not contain any @author tags.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new compiler warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests -1. The patch failed contrib unit tests.

Test results:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1275/testReport/
Findbugs warnings:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1275/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1275/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1275/console

This message is automatically generated.

PendingReplicationMonitor thread received exception.
java.lang.InterruptedException
---

Key: HADOOP-2359
URL: https://issues.apache.org/jira/browse/HADOOP-2359
Project: Hadoop
Issue Type: Bug
Components: dfs
Affects Versions: 0.16.0
Reporter: Owen O'Malley
Assignee: dhruba borthakur
Fix For: 0.16.0

Attachments: replicationWarning.patch

I sometimes get the message:
07/12/05 19:01:36 WARN fs.FSNamesystem: PendingReplicationMonitor thread
received exception. java.lang.InterruptedException: sleep interrupted
from mini-dfs cluster.
InterruptedExceptions should be handled quietly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4) tool to mount dfs on linux


[ 
https://issues.apache.org/jira/browse/HADOOP-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548852
 ] 

Anurag Sharma commented on HADOOP-4:


hi Doug.  ok :- ), we will follow one of the alternate options you suggested of 
hosting either the patch or the jar file ourselves, and fixing the 
fuse-j-hadoop package build to work with this.  Will re-submit our changes soon.
-thanks,
-anurag

 tool to mount dfs on linux
 --

 Key: HADOOP-4
 URL: https://issues.apache.org/jira/browse/HADOOP-4
 Project: Hadoop
  Issue Type: Improvement
  Components: fs
Affects Versions: 0.5.0
 Environment: linux only
Reporter: John Xing
Assignee: Doug Cutting
 Attachments: fuse-hadoop-0.1.0_fuse-j.2.2.3_hadoop.0.5.0.tar.gz, 
 fuse-hadoop-0.1.0_fuse-j.2.4_hadoop.0.5.0.tar.gz, fuse-hadoop-0.1.1.tar.gz, 
 fuse-j-hadoopfs-0.1.zip, fuse-j-patch.zip


 tool to mount dfs on linux

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4) tool to mount dfs on linux


 [ 
https://issues.apache.org/jira/browse/HADOOP-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anurag Sharma updated HADOOP-4:
---

Attachment: (was: fuse-j-patch.zip)

 tool to mount dfs on linux
 --

 Key: HADOOP-4
 URL: https://issues.apache.org/jira/browse/HADOOP-4
 Project: Hadoop
  Issue Type: Improvement
  Components: fs
Affects Versions: 0.5.0
 Environment: linux only
Reporter: John Xing
Assignee: Doug Cutting
 Attachments: fuse-hadoop-0.1.0_fuse-j.2.2.3_hadoop.0.5.0.tar.gz, 
 fuse-hadoop-0.1.0_fuse-j.2.4_hadoop.0.5.0.tar.gz, fuse-hadoop-0.1.1.tar.gz, 
 fuse-j-hadoopfs-0.1.zip


 tool to mount dfs on linux

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2329) [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data

[
https://issues.apache.org/jira/browse/HADOOP-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548855
]

Bryan Duxbury commented on HADOOP-2329:
---

I don't think there should be a type field. That's up to the application to
deal with. It would add a ton of overhead to everything in HBase and require a
huge overhaul of how stuff works. It would also take away a good deal of
flexibility.

The fact that the shell cannot understand user-supplied key/value based data
types is not a good motivation for adding it. The shell should really only be a
administrative utility anyway, just enough to be able create and drop tables
and to peek at a row here or there. I doubt that people who write their
applications to use HBase are going to be limited by the lack of built-in data
types.

[Hbase Shell] Addition of Built-In Value Data Types for efficient accessing
and stroing data

Key: HADOOP-2329
URL: https://issues.apache.org/jira/browse/HADOOP-2329
Project: Hadoop
Issue Type: New Feature
Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Edward Yoon
Assignee: Edward Yoon
Fix For: 0.16.0

A built-in data type is a fundamental data type that the hbase shell defines.
(character strings, scalars, ranges, arrays, ... , etc)
If you need a specialized data type that is not currently provided as a
built-in type,
you are encouraged to write your own user-defined data type using UDC(not yet
implemented).
(or contribute it for distribution in a future release of hbase shell)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reprioritize HBase issues in JIRA

2007-12-05 Thread Bryan Duxbury


Hey all,

It seems to me like there's a lot of mis-prioritized HBase issues in  
the JIRA at the moment, since the default is Major. I'd like to give  
it a once-over and reprioritize the tickets, if no one objects. I  
think it would make our project easier to assess at a glance.


-Bryan Duxbury

[jira] Created: (HADOOP-2361) hadoop version wrong in 0.15.1

2007-12-05 Thread lohit vijayarenu (JIRA)

hadoop version wrong in 0.15.1
--

 Key: HADOOP-2361
 URL: https://issues.apache.org/jira/browse/HADOOP-2361
 Project: Hadoop
  Issue Type: Bug
  Components: build
Affects Versions: 0.15.1
Reporter: lohit vijayarenu


I downloaded 0.15.1 release, recompiled and executed ./bin/hadoop version. It 
says 0.15.2-dev picking it from build.xml

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2328) [Hbase Shell] Non-index join columns


 [ 
https://issues.apache.org/jira/browse/HADOOP-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Duxbury updated HADOOP-2328:
--

Priority: Trivial  (was: Major)

 [Hbase Shell] Non-index join columns
 

 Key: HADOOP-2328
 URL: https://issues.apache.org/jira/browse/HADOOP-2328
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Edward Yoon
Assignee: Edward Yoon
Priority: Trivial
 Fix For: 0.16.0

 Attachments: 2328.patch, 2328_v02.patch


 If we don't have an index for a domain in the join, we can still improve on 
 the nested-loop join using sort join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2283) [hbase] Stuck replay of failed regionserver edits

[
https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548674
]

Hadoop QA commented on HADOOP-2283:
---

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12371004/compaction.patch
against trunk revision r601232.

@author +1. The patch does not contain any @author tags.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new compiler warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1271/testReport/
Findbugs warnings:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1271/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1271/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1271/console

This message is automatically generated.

[hbase] Stuck replay of failed regionserver edits
-

Key: HADOOP-2283
URL: https://issues.apache.org/jira/browse/HADOOP-2283
Project: Hadoop
Issue Type: Bug
Components: contrib/hbase
Reporter: stack
Assignee: stack
Fix For: 0.16.0

Attachments: compaction.patch, OP_READ.patch

Looking in master for a cluster of ~90 regionservers, the regionserver
carrying the ROOT went down (because it hadn't talked to the master in 30
seconds).
Master notices the downed regionserver because its lease timesout. It then
goes to run the shutdown server sequence only splitting the regionserver's
edit log, it gets stuck trying to split the second of three log files.
Eventually, after ~5minutes, the second log split throws:
34974 2007-11-26 01:21:23,999 WARN hbase.HMaster - Processing pending
operations: ProcessServerShutdown of XX.XX.XX.XX:60020
34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException:
org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file
/hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client
XX.XX.XX.XX because current leaseholder is trying to recreate file.
34976 at
org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848)
34977 at
org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804)
34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
34980 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
34981 at java.lang.reflect.Method.invoke(Method.java:597)
34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
34984
34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
34986 at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
34987 at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
34989 at
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094)
And so on every 5 minutes.
Because the regionserver that went down had ROOT region, and because we are
stuck in this eternal loop, ROOT never gets reallocated.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2264) Support an OutputFormat for HQL row data


 [ 
https://issues.apache.org/jira/browse/HADOOP-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Duxbury updated HADOOP-2264:
--

Priority: Trivial  (was: Major)

Not major issue.

 Support an OutputFormat for HQL row data
 

 Key: HADOOP-2264
 URL: https://issues.apache.org/jira/browse/HADOOP-2264
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Reporter: Paul Saab
Assignee: Edward Yoon
Priority: Trivial

 Currently when selecting a row, if the data does not convert to a String the 
 hbase shell will print garbage.  It would be nice if  HQL supported a 
 mechanism to format individual columns.  Something along the lines of:
 select col1: format(SomeFormatClass), col2: format(AnotherFormatClass) from 
 table 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2265) [Hbase Shell] Addition of LIKE operator for a select-condition


 [ 
https://issues.apache.org/jira/browse/HADOOP-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Duxbury updated HADOOP-2265:
--

Priority: Trivial  (was: Major)

Not a major issue.

 [Hbase Shell] Addition of LIKE operator for a select-condition
 --

 Key: HADOOP-2265
 URL: https://issues.apache.org/jira/browse/HADOOP-2265
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Affects Versions: 0.15.0
Reporter: Edward Yoon
Assignee: Edward Yoon
Priority: Trivial
 Fix For: 0.16.0


 The LIKE operator is used in character string comparisons with pattern 
 matching.
 With the LIKE operator, you can compare a value to a pattern rather than to a 
 constant.  
 SYNTAX :
 {code}
 [NOT] LIKE 'character' [ESCAPE 'character'] 
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2143) [Hbase Shell] Cell-value index option using lucene.


 [ 
https://issues.apache.org/jira/browse/HADOOP-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Duxbury updated HADOOP-2143:
--

Priority: Trivial  (was: Major)

Not a major issue.

 [Hbase Shell] Cell-value index option using lucene.
 ---

 Key: HADOOP-2143
 URL: https://issues.apache.org/jira/browse/HADOOP-2143
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Affects Versions: 0.14.3
 Environment: all environments
Reporter: Edward Yoon
Assignee: Edward Yoon
Priority: Trivial
 Fix For: 0.16.0


 value, row-key1, row-key2[, row-key3]  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2329) [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data

[
https://issues.apache.org/jira/browse/HADOOP-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548869
]

Edward Yoon commented on HADOOP-2329:
-

The shell should really only be a administrative utility anyway, just enough
to be able create and drop tables and to peek at a row here or there.

I don't think so.
What do you think about this ment.

- The shell should really only be a administrative utility anyway, just enough
to be *reboot* and *dir* and to peek at a *file name* here or there.

It would add a ton of overhead to everything in HBase and require a huge
overhaul of how stuff works.
It would also take away a good deal of flexibility.

I don't think so. you can just use the byte[]. ok?
Also, applications developers need to modeling capacity on Hbase. (It's very
difficult in my experience, so shell's guide will be very useful.)

I doubt that people who write their applications to use HBase are going to
be limited by the lack of built-in data types.

I don't think so.
If you are studied Database and Math, you can really powerful use the Some DB
solutions.
But, many peoples(application developers) can't.

Why...?

More think please.

[Hbase Shell] Addition of Built-In Value Data Types for efficient accessing
and stroing data

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: Reprioritize HBase issues in JIRA

2007-12-05 Thread edward yoon


I think this is a very subjective judgment of some application ability.

--

B. Regards,

Edward yoon @ NHN, corp.
Home : http://www.udanax.org


 From: [EMAIL PROTECTED]
 To: hadoop-dev@lucene.apache.org
 Date: Wed, 5 Dec 2007 15:01:41 -0800
 Subject: RE: Reprioritize HBase issues in JIRA

 Yes, that would be a big help. Go for it! And thanks for the help.

 ---
 Jim Kellerman, Senior Engineer; Powerset


 -Original Message-
 From: Bryan Duxbury [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, December 05, 2007 2:59 PM
 To: hadoop-dev@lucene.apache.org
 Subject: Reprioritize HBase issues in JIRA

 Hey all,

 It seems to me like there's a lot of mis-prioritized HBase
 issues in the JIRA at the moment, since the default is Major.
 I'd like to give it a once-over and reprioritize the tickets,
 if no one objects. I think it would make our project easier
 to assess at a glance.

 -Bryan Duxbury




_
Your smile counts. The more smiles you share, the more we donate.  Join in.
www.windowslive.com/smile?ocid=TXT_TAGLM_Wave2_oprsmilewlhmtagline

[jira] Commented: (HADOOP-2339) [Hbase Shell] Delete command with no WHERE clause


[ 
https://issues.apache.org/jira/browse/HADOOP-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548871
 ] 

Bryan Duxbury commented on HADOOP-2339:
---

To clarify this issue, are we talking about what is essentially an ALTER TABLE 
DROP COLUMN in SQL? If so, the description should be changed to reflect that.

 [Hbase Shell] Delete command with no WHERE clause
 -

 Key: HADOOP-2339
 URL: https://issues.apache.org/jira/browse/HADOOP-2339
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Edward Yoon
Assignee: Edward Yoon
 Fix For: 0.16.0

 Attachments: 2339.patch, 2339_v02.patch, 2339_v03.patch, 
 2339_v04.patch


 using HbaseAdmin.deleteColumn() method.
 {code}
 DELETE column_name FROM table_name;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2339) [Hbase Shell] Delete command with no WHERE clause


 [ 
https://issues.apache.org/jira/browse/HADOOP-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Duxbury updated HADOOP-2339:
--

Priority: Minor  (was: Major)

Not a major issue, but should still be looked at.

 [Hbase Shell] Delete command with no WHERE clause
 -

 Key: HADOOP-2339
 URL: https://issues.apache.org/jira/browse/HADOOP-2339
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Edward Yoon
Assignee: Edward Yoon
Priority: Minor
 Fix For: 0.16.0

 Attachments: 2339.patch, 2339_v02.patch, 2339_v03.patch, 
 2339_v04.patch


 using HbaseAdmin.deleteColumn() method.
 {code}
 DELETE column_name FROM table_name;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2351) [Hbase Shell] If select command returns no result, it doesn't need to show the header information.


 [ 
https://issues.apache.org/jira/browse/HADOOP-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Duxbury updated HADOOP-2351:
--

Priority: Trivial  (was: Major)

Doesn't make a functional change, only cosmetic. Not a major issue.

 [Hbase Shell] If select command returns no result, it doesn't need to show 
 the header information.
 --

 Key: HADOOP-2351
 URL: https://issues.apache.org/jira/browse/HADOOP-2351
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Edward Yoon
Assignee: Edward Yoon
Priority: Trivial
 Fix For: 0.16.0

 Attachments: 2351.patch


 {code}
 hql  select * from udanax;
 +-+-+-+
 | Row | Column  | Cell
 |
 +-+-+-+
 0 row(s) in set. (0.09 sec)
 hql  exit;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2006) Aggregate Functions in select statement


[ 
https://issues.apache.org/jira/browse/HADOOP-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548874
 ] 

Bryan Duxbury commented on HADOOP-2006:
---

This seems like a bad idea. You could have TONS of data, and aggregating it in 
one place would take forever. If you want to produce aggregate info, you should 
probably fire off a Map Reduce job, no?

 Aggregate Functions in select statement
 ---

 Key: HADOOP-2006
 URL: https://issues.apache.org/jira/browse/HADOOP-2006
 Project: Hadoop
  Issue Type: Sub-task
  Components: contrib/hbase
Affects Versions: 0.14.1
Reporter: Edward Yoon
Assignee: Edward Yoon
Priority: Minor
 Fix For: 0.16.0


 Aggregation functions on collections of data values: average, minimum, 
 maximum, sum, count.
 Group rows by value of an columnfamily and apply aggregate function 
 independently to each group of rows.
  * Grouping columnfamilies  ƒ ~function_list~ (Relation)
 {code}
 select producer, avg(year) from movieLog_table group by producer
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2361) hadoop version wrong in 0.15.1

2007-12-05 Thread lohit vijayarenu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548875
 ] 

lohit vijayarenu commented on HADOOP-2361:
--

my bad, looks like i picked 0.15 branch instead of tag. closing this as invalid

 hadoop version wrong in 0.15.1
 --

 Key: HADOOP-2361
 URL: https://issues.apache.org/jira/browse/HADOOP-2361
 Project: Hadoop
  Issue Type: Bug
  Components: build
Affects Versions: 0.15.1
Reporter: lohit vijayarenu

 I downloaded 0.15.1 release, recompiled and executed ./bin/hadoop version. It 
 says 0.15.2-dev picking it from build.xml

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HADOOP-2361) hadoop version wrong in 0.15.1

2007-12-05 Thread lohit vijayarenu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lohit vijayarenu resolved HADOOP-2361.
--

Resolution: Invalid

 hadoop version wrong in 0.15.1
 --

 Key: HADOOP-2361
 URL: https://issues.apache.org/jira/browse/HADOOP-2361
 Project: Hadoop
  Issue Type: Bug
  Components: build
Affects Versions: 0.15.1
Reporter: lohit vijayarenu

 I downloaded 0.15.1 release, recompiled and executed ./bin/hadoop version. It 
 says 0.15.2-dev picking it from build.xml

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2338) [hbase] NPE in master server


 [ 
https://issues.apache.org/jira/browse/HADOOP-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Duxbury updated HADOOP-2338:
--

Priority: Critical  (was: Major)

This is an important issue.

 [hbase] NPE in master server
 

 Key: HADOOP-2338
 URL: https://issues.apache.org/jira/browse/HADOOP-2338
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Jim Kellerman
Assignee: Jim Kellerman
Priority: Critical
 Fix For: 0.16.0

 Attachments: master.log.gz, patch.txt


 Master gets an NPE after receiving multiple responses from the same server 
 telling the master it has opened a region.
 {code}
 2007-12-02 20:31:37,515 DEBUG hbase.HRegion - Next sequence id for region 
 postlog,img254/577/02suecia024richardburnson0.jpg,1196619667879 is 73377537
 2007-12-02 20:31:37,517 INFO  hbase.HRegion - region 
 postlog,img254/577/02suecia024richardburnson0.jpg,1196619667879 available
 2007-12-02 20:31:39,200 WARN  hbase.HRegionServer - Processing message 
 (Retry: 0)
 java.io.IOException: java.io.IOException: java.lang.NullPointerException
 at org.apache.hadoop.hbase.HMaster.processMsgs(HMaster.java :1484)
 at org.apache.hadoop.hbase.HMaster.regionServerReport(HMaster.java:1423)
 at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java
  :25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0 (Native Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java
  :27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
 at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException 
 (RemoteExceptionHandler.java:48)
 at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:759)
 at java.lang.Thread.run(Thread.java:619)
   case HMsg.MSG_REPORT_PROCESS_OPEN:
 synchronized ( this.assignAttempts) {
   // Region server has acknowledged request to open region.
   // Extend region open time by 1/2 max region open time.
 **1484**  assignAttempts.put(region.getRegionName (), 
   Long.valueOf(assignAttempts.get(
   region.getRegionName()).longValue() +
   (this.maxRegionOpenTime / 2)));
 }
 break;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2329) [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data


[ 
https://issues.apache.org/jira/browse/HADOOP-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548877
 ] 

Jim Kellerman commented on HADOOP-2329:
---

One of the stated goals of the HBase project is to produce a system as similar 
to Bigtable as possible (see http://wiki.apache.org/lucene-hadoop/Hbase#goals). 
In this spirit, HBase will remain typeless and it is likely that we will go 
ahead with HADOOP-2334 (making row keys WritableComparable instead of Text) 
once we get a chance to breathe after getting out from under the major bugs.

 [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing 
 and stroing data
 

 Key: HADOOP-2329
 URL: https://issues.apache.org/jira/browse/HADOOP-2329
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Edward Yoon
Assignee: Edward Yoon
Priority: Trivial
 Fix For: 0.16.0


 A built-in data type is a fundamental data type that the hbase shell defines.
 (character strings, scalars, ranges, arrays, ... , etc)
 If you need a specialized data type that is not currently provided as a 
 built-in type, 
 you are encouraged to write your own user-defined data type using UDC(not yet 
 implemented).
 (or contribute it for distribution in a future release of hbase shell)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.


 [ 
https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HADOOP-2185:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed this. Thanks Konstantin!

 Server ports: to roll or not to roll.
 -

 Key: HADOOP-2185
 URL: https://issues.apache.org/jira/browse/HADOOP-2185
 Project: Hadoop
  Issue Type: Improvement
  Components: conf, dfs, mapred
Affects Versions: 0.15.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Fix For: 0.16.0

 Attachments: FixedPorts3.patch, FixedPorts4.patch, port.stack


 Looked at the issues related to port rolling. My impression is that port 
 rolling is required only for the unit tests to run.
 Even the name-node port should roll there, which we don't have now, in order 
 to be able to start 2 cluster for testing say dist cp.
 For real clusters on the contrary port rolling is not desired and some times 
 even prohibited.
 So we should have a way of to ban port rolling. My proposition is to
 # use ephemeral port 0 if port rolling is desired
 # if a specific port is specified then port rolling should not happen at all, 
 meaning that a 
 server is either able or not able to start on that particular port.
 The desired port is specified via configuration parameters.
 - Name-node: fs.default.name = host:port
 - Data-node: dfs.datanode.port
 - Job-tracker: mapred.job.tracker = host:port
 - Task-tracker: mapred.task.tracker.report.bindAddress = host
   Task-tracker currently does not have an option to specify port, it always 
 uses the ephemeral port 0, 
   and therefore I propose to add one.
 - Secondary node does not need a port to listen on.
 For info servers we have two sets of config variables *.info.bindAddress and 
 *.info.port
 except for the task tracker, which calls them *.http.bindAddress and 
 *.http.port instead of info.
 With respect to the info servers I propose to completely eliminate the port 
 parameters, and form 
 *.info.bindAddress = host:port
 Info servers should do the same thing, namely start or fail on the specified 
 port if it is not 0,
 and start on any free port if it is ephemeral.
 For the task-tracker I would rename tasktracker.http.bindAddress to 
 mapred.task.tracker.info.bindAddress
 For the data-node the info dfs.datanode.info.bindAddress should be included 
 into the default config.
 Is there a reason why it is not there?
 This is the summary of proposed changes:
 || Server || current name = value || proposed name = value ||
 | NameNode | fs.default.name = host:port | same |
 | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
 | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = 
 host:port |
 | | dfs.datanode.port = port | eliminate |
 | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = 
 host:port |
 | | dfs.datanode.info.port = port | eliminate |
 | JobTracker | mapred.job.tracker = host:port | same |
 | | mapred.job.tracker.info.bindAddress = host | 
 mapred.job.tracker.http.bindAddress = host:port |
 | | mapred.job.tracker.info.port = port | eliminate |
 | TaskTracker | mapred.task.tracker.report.bindAddress = host | 
 mapred.task.tracker.report.bindAddress = host:port |
 | | tasktracker.http.bindAddress = host | 
 mapred.task.tracker.http.bindAddress = host:port |
 | | tasktracker.http.port = port | eliminate |
 | SecondaryNameNode | dfs.secondary.info.bindAddress = host | 
 dfs.secondary.http.bindAddress = host:port |
 | | dfs.secondary.info.port = port | eliminate |
 Do we also want to set some uniform naming convention for the configuration 
 variables?
 Like having hdfs instead of dfs, or info instead of http, or systematically 
 using either datanode
 or data.node would make that look better in my opinion.
 So these are all +*api*+ changes. I would +*really*+ like some feedback on 
 this, especially from 
 people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2329) [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data


 [ 
https://issues.apache.org/jira/browse/HADOOP-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Duxbury updated HADOOP-2329:
--

Priority: Trivial  (was: Major)

Not a major issue.

 [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing 
 and stroing data
 

 Key: HADOOP-2329
 URL: https://issues.apache.org/jira/browse/HADOOP-2329
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Edward Yoon
Assignee: Edward Yoon
Priority: Trivial
 Fix For: 0.16.0


 A built-in data type is a fundamental data type that the hbase shell defines.
 (character strings, scalars, ranges, arrays, ... , etc)
 If you need a specialized data type that is not currently provided as a 
 built-in type, 
 you are encouraged to write your own user-defined data type using UDC(not yet 
 implemented).
 (or contribute it for distribution in a future release of hbase shell)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2006) Aggregate Functions in select statement


[ 
https://issues.apache.org/jira/browse/HADOOP-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548879
 ] 

Edward Yoon commented on HADOOP-2006:
-

I don't understand your comment.
Please more explanation for me.

 Aggregate Functions in select statement
 ---

 Key: HADOOP-2006
 URL: https://issues.apache.org/jira/browse/HADOOP-2006
 Project: Hadoop
  Issue Type: Sub-task
  Components: contrib/hbase
Affects Versions: 0.14.1
Reporter: Edward Yoon
Assignee: Edward Yoon
Priority: Minor
 Fix For: 0.16.0


 Aggregation functions on collections of data values: average, minimum, 
 maximum, sum, count.
 Group rows by value of an columnfamily and apply aggregate function 
 independently to each group of rows.
  * Grouping columnfamilies  ƒ ~function_list~ (Relation)
 {code}
 select producer, avg(year) from movieLog_table group by producer
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2012) Periodic verification at the Datanode

[
https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548761
]

Raghu Angadi commented on HADOOP-2012:
--

This is particularly vexing when design choices could clearly be made that
would avoid these issues.
Our initial design did not modify metadata files on Datanode.. that was my
preference too. All this stems from the fact that we are modifying these files.

Periodic verification at the Datanode
-

Key: HADOOP-2012
URL: https://issues.apache.org/jira/browse/HADOOP-2012
Project: Hadoop
Issue Type: New Feature
Components: dfs
Reporter: Raghu Angadi
Assignee: Raghu Angadi
Fix For: 0.16.0

Attachments: HADOOP-2012.patch, HADOOP-2012.patch, HADOOP-2012.patch,
HADOOP-2012.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: Reprioritize HBase issues in JIRA

2007-12-05 Thread Jim Kellerman

Yes, that would be a big help. Go for it! And thanks for the help.

---
Jim Kellerman, Senior Engineer; Powerset

 -Original Message-
 From: Bryan Duxbury [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, December 05, 2007 2:59 PM
 To: hadoop-dev@lucene.apache.org
 Subject: Reprioritize HBase issues in JIRA

 Hey all,

 It seems to me like there's a lot of mis-prioritized HBase
 issues in the JIRA at the moment, since the default is Major.
 I'd like to give it a once-over and reprioritize the tickets,
 if no one objects. I think it would make our project easier
 to assess at a glance.

 -Bryan Duxbury

[jira] Updated: (HADOOP-496) Expose HDFS as a WebDAV store

[
https://issues.apache.org/jira/browse/HADOOP-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Anurag Sharma updated HADOOP-496:
-

Attachment: (was: fuse-j-patch.zip)

Expose HDFS as a WebDAV store
-

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2283) [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed regionserver edits)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HADOOP-2283:
--

Priority: Minor  (was: Major)
 Summary: [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed 
regionserver edits)  (was: [hbase] Stuck replay of failed regionserver edits)

AlreadyBeingCreatedException was seen last night in a Bryan Duxbury upload 
(Added ABCE to title)

Committed the compaction.patch as part of HADOOP-2357.

 [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed 
 regionserver edits)
 -

 Key: HADOOP-2283
 URL: https://issues.apache.org/jira/browse/HADOOP-2283
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Assignee: stack
Priority: Minor
 Fix For: 0.16.0

 Attachments: compaction.patch, OP_READ.patch


 Looking in master for a cluster of ~90 regionservers, the regionserver 
 carrying the ROOT went down (because it hadn't talked to the master in 30 
 seconds).
 Master notices the downed regionserver because its lease timesout. It then 
 goes to run the shutdown server sequence only splitting the regionserver's 
 edit log, it gets stuck trying to split the second of three log files. 
 Eventually, after ~5minutes, the second log split throws:
 34974 2007-11-26 01:21:23,999 WARN  hbase.HMaster - Processing pending 
 operations: ProcessServerShutdown of XX.XX.XX.XX:60020
   34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException: 
 org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file 
 /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client 
 XX.XX.XX.XX because current leaseholder is trying to recreate file.
   34976 at 
 org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848)
   34977 at 
 org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804)
   34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
   34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
   34980 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   34981 at java.lang.reflect.Method.invoke(Method.java:597)
   34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
   34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
   34984 
   34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
   34986 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   34987 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   34989 at 
 org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
   34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094)
 And so on every 5 minutes.
 Because the regionserver that went down had ROOT region, and because we are 
 stuck in this eternal loop, ROOT never gets reallocated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2329) [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data


[ 
https://issues.apache.org/jira/browse/HADOOP-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548893
 ] 

Edward Yoon commented on HADOOP-2329:
-

 Are you proposing to do the data types entirely outside of HBase or 
 leveraging HADOOP-2197 ? Or do you want internal support for data types?

Yes, I'm thinking the former.

 [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing 
 and stroing data
 

 Key: HADOOP-2329
 URL: https://issues.apache.org/jira/browse/HADOOP-2329
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Edward Yoon
Assignee: Edward Yoon
Priority: Trivial
 Fix For: 0.16.0


 A built-in data type is a fundamental data type that the hbase shell defines.
 (character strings, scalars, ranges, arrays, ... , etc)
 If you need a specialized data type that is not currently provided as a 
 built-in type, 
 you are encouraged to write your own user-defined data type using UDC(not yet 
 implemented).
 (or contribute it for distribution in a future release of hbase shell)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: [jira] Commented: (HADOOP-2006) Aggregate Functions in select statement

2007-12-05 Thread edward yoon


 it will encourage people to think that the shell is a good way to interact 
 with HBase in general...

I think this is a key point. :)

The Hbase Shell aim is to improve the work's efficiency, without research of 
specified knowledge.
I'll makes an accessory for database access methods on Hbase.
Also, i'm thinking about Matrix operations on Hbase.

But, ... Hbase Shell just a one of applications on Hbase.

...
Let's think.

If you mistake that standard sql is all of A-DBMS capacity, 
I think you don't want to studies about database structure, access algorithms, 
philosophies,.., etc of A-DBMS.

Then, Can i make you use the A-DBMS's 100% Full capacity by force?

Or 

Let's assume the A-DBMS didn't provide standard sql.
Are you want to use the A-DBMS?

Ok.. 
If you want to use the A-DBMS, you already didn't thought the sql isn't all of 
A-DBMS.

So, conclusion?
The more affluent the hbase shell, the use of hbase will be growing very 
rapidly.


--

B. Regards,

Edward yoon @ NHN, corp.
Home : http://www.udanax.org


 From: [EMAIL PROTECTED]
 Subject: Re: [jira] Commented: (HADOOP-2006) Aggregate Functions in select 
 statement
 Date: Wed, 5 Dec 2007 15:50:50 -0800
 To: hadoop-dev@lucene.apache.org

 If you have a table with something like a billion rows, and do an
 aggregate function on the table from the shell, you will end up
 reading all billion rows through a single machine, essentially
 aggregating the entire dataset locally. This defeats the purpose of
 having a massively distributed database like HBase. To do this more
 efficiently, you'd ideally kick of a Map Reduce job that can perform
 the various aggregation function on the dataset in parallel,
 harnessing the power of the distributed dataset, and then returning
 the results to a central location once they are calculated.

 I think putting this option into the shell is risky, because it will
 encourage people to think that the shell is a good way to interact
 with HBase in general, which it isn't. We want people to understand
 HBase is best consumed in parallel and discourage solutions that
 aggregate access through a single point. As such, we shouldn't build
 features that allow people to inadvertently use the wrong access
 patterns.

 On Dec 5, 2007, at 3:38 PM, Edward Yoon (JIRA) wrote:


 [ https://issues.apache.org/jira/browse/HADOOP-2006?
 page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
 tabpanel#action_12548879 ]

 Edward Yoon commented on HADOOP-2006:
 -

 I don't understand your comment.
 Please more explanation for me.

 Aggregate Functions in select statement
 ---

 Key: HADOOP-2006
 URL: https://issues.apache.org/jira/browse/
 HADOOP-2006
 Project: Hadoop
 Issue Type: Sub-task
 Components: contrib/hbase
 Affects Versions: 0.14.1
 Reporter: Edward Yoon
 Assignee: Edward Yoon
 Priority: Minor
 Fix For: 0.16.0


 Aggregation functions on collections of data values: average,
 minimum, maximum, sum, count.
 Group rows by value of an columnfamily and apply aggregate
 function independently to each group of rows.
 *  ƒ ~function_list~ (Relation)
 {code}
 select producer, avg(year) from movieLog_table group by producer
 {code}

 --
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.



_
Put your friends on the big screen with Windows Vista® + Windows Live™.
http://www.microsoft.com/windows/shop/specialoffers.mspx?ocid=TXT_TAGLM_CPC_MediaCtr_bigscreen_102007

[jira] Updated: (HADOOP-2311) [hbase] Could not complete hdfs write out to flush file forcing regionserver restart


 [ 
https://issues.apache.org/jira/browse/HADOOP-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Duxbury updated HADOOP-2311:
--

Priority: Critical  (was: Minor)

Sounds serious. Changing to critical.

 [hbase] Could not complete hdfs write out to flush file forcing regionserver 
 restart
 

 Key: HADOOP-2311
 URL: https://issues.apache.org/jira/browse/HADOOP-2311
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Priority: Critical
 Attachments: delete-logging.patch


 I've spent some time looking into this issue but there are not enough clues 
 in the logs to tell where the problem is. Here's what I know.
 Two region servers went down last night, a minute apart, during Paul Saab's 
 6hr run inserting 300million rows into hbase. The regionservers went down to 
 force rerun of hlog and avoid possible data loss after a failure writing 
 memory flushes to hdfs.
 Here is the lead up to the failed flush:
 ...
 2007-11-28 22:40:02,231 INFO  hbase.HRegionServer - MSG_REGION_OPEN : 
 regionname: postlog,img149/4699/133lm0.jpg,1196318393738, startKey: 
 img149/4699/133lm0.jpg, tableDesc: {name: postlog, families: 
 {cookie:={name: cookie, max versions: 1, compression: NONE, in memory: false, 
 max length: 2147483647, bloom filter: none}, ip:={name: ip, max versions: 1, 
 compression: NONE, in memory: false, max length: 2147483647, bloom filter: 
 none}}}
 2007-11-28 22:40:02,242 DEBUG hbase.HStore - starting 1703405830/cookie (no 
 reconstruction log)
 2007-11-28 22:40:02,741 DEBUG hbase.HStore - maximum sequence id for hstore 
 1703405830/cookie is 29077708
 2007-11-28 22:40:03,094 DEBUG hbase.HStore - starting 1703405830/ip (no 
 reconstruction log)
 2007-11-28 22:40:03,852 DEBUG hbase.HStore - maximum sequence id for hstore 
 1703405830/ip is 29077708
 2007-11-28 22:40:04,138 DEBUG hbase.HRegion - Next sequence id for region 
 postlog,img149/4699/133lm0.jpg,1196318393738 is 29077709
 2007-11-28 22:40:04,141 INFO  hbase.HRegion - region 
 postlog,img149/4699/133lm0.jpg,1196318393738 available
 2007-11-28 22:40:04,141 DEBUG hbase.HLog - changing sequence number from 
 21357623 to 29077709
 2007-11-28 22:40:04,141 INFO  hbase.HRegionServer - MSG_REGION_OPEN : 
 regionname: postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739, 
 startKey: img149/7512/dscnlightenedfi3.jpg, tableDesc: {name: postlog, 
 families: {cookie:={name: cookie, max versions: 1, compression: NONE, in 
 memory: false, max length: 2147483647, bloom filter: none}, ip:={name: ip, 
 max versions: 1, compression: NONE, in memory: false, max length: 2147483647, 
 bloom filter: none}}}
 2007-11-28 22:40:04,145 DEBUG hbase.HStore - starting 376748222/cookie (no 
 reconstruction log)
 2007-11-28 22:40:04,223 DEBUG hbase.HStore - maximum sequence id for hstore 
 376748222/cookie is 29077708
 2007-11-28 22:40:04,277 DEBUG hbase.HStore - starting 376748222/ip (no 
 reconstruction log)
 2007-11-28 22:40:04,353 DEBUG hbase.HStore - maximum sequence id for hstore 
 376748222/ip is 29077708
 2007-11-28 22:40:04,699 DEBUG hbase.HRegion - Next sequence id for region 
 postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 is 29077709
 2007-11-28 22:40:04,701 INFO  hbase.HRegion - region 
 postlog,img149/7512/dscnlightenedfi3.jpg,1196318393739 available
 2007-11-28 22:40:34,427 DEBUG hbase.HRegionServer - flushing region 
 postlog,img143/1310/yashrk3.jpg,1196317258704
 2007-11-28 22:40:34,428 DEBUG hbase.HRegion - Not flushing cache for region 
 postlog,img143/1310/yashrk3.jpg,1196317258704: snapshotMemcaches() determined 
 that there was nothing to do
 2007-11-28 22:40:55,745 DEBUG hbase.HRegionServer - flushing region 
 postlog,img142/8773/1001417zc4.jpg,1196317258703
 2007-11-28 22:40:55,745 DEBUG hbase.HRegion - Not flushing cache for region 
 postlog,img142/8773/1001417zc4.jpg,1196317258703: snapshotMemcaches() 
 determined that there was nothing to do
 2007-11-28 22:41:04,144 DEBUG hbase.HRegionServer - flushing region 
 postlog,img149/4699/133lm0.jpg,1196318393738
 2007-11-28 22:41:04,144 DEBUG hbase.HRegion - Started memcache flush for 
 region postlog,img149/4699/133lm0.jpg,1196318393738. Size 74.7k
 2007-11-28 22:41:04,764 DEBUG hbase.HStore - Added 
 1703405830/ip/610047924323344967 with sequence id 29081563 and size 53.8k
 2007-11-28 22:41:04,902 DEBUG hbase.HStore - Added 
 1703405830/cookie/3147798053949544972 with sequence id 29081563 and size 41.3k
 2007-11-28 22:41:04,902 DEBUG hbase.HRegion - Finished memcache flush for 
 region postlog,img149/4699/133lm0.jpg,1196318393738 in 758ms, 
 sequenceid=29081563
 2007-11-28 22:41:04,902 DEBUG hbase.HStore - compaction for HStore 
 postlog,img149/4699/133lm0.jpg,1196318393738/ip needed.
 2007-11-28 22:41:04,903

[jira] Updated: (HADOOP-1550) [hbase] No means of deleting a'row' nor all members of a column family


 [ 
https://issues.apache.org/jira/browse/HADOOP-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Duxbury updated HADOOP-1550:
--

Priority: Major  (was: Minor)

Seems like this is pretty important for the API to be complete. Elevating to 
Major.

 [hbase] No means of deleting a'row' nor all members of a column family
 --

 Key: HADOOP-1550
 URL: https://issues.apache.org/jira/browse/HADOOP-1550
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: stack

 There is no support in hbase currently for deleting a row -- i.e. remove all 
 columns and their versions keyed by a particular row id.  Nor is there a 
 means of passing in a row id and column family name having hbase delete all 
 members of the column family (for the designated row).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2243) [hbase] getRow returns empty Map if no-such row.. should return null


 [ 
https://issues.apache.org/jira/browse/HADOOP-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Duxbury updated HADOOP-2243:
--

Priority: Major  (was: Minor)

Results in ambiguous answer about existence of a cell, so elevating to Major.

 [hbase] getRow returns empty Map if no-such row.. should return null
 

 Key: HADOOP-2243
 URL: https://issues.apache.org/jira/browse/HADOOP-2243
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack

 Found by Bryan Duxbury.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (HADOOP-2006) Aggregate Functions in select statement

2007-12-05 Thread Bryan Duxbury

If you have a table with something like a billion rows, and do an  
aggregate function on the table from the shell, you will end up  
reading all billion rows through a single machine, essentially  
aggregating the entire dataset locally. This defeats the purpose of  
having a massively distributed database like HBase. To do this more  
efficiently, you'd ideally kick of a Map Reduce job that can perform  
the various aggregation function on the dataset in parallel,  
harnessing the power of the distributed dataset, and then returning  
the results to a central location once they are calculated.


I think putting this option into the shell is risky, because it will  
encourage people to think that the shell is a good way to interact  
with HBase in general, which it isn't. We want people to understand  
HBase is best consumed in parallel and discourage solutions that  
aggregate access through a single point. As such, we shouldn't build  
features that allow people to inadvertently use the wrong access  
patterns.


On Dec 5, 2007, at 3:38 PM, Edward Yoon (JIRA) wrote:



[ https://issues.apache.org/jira/browse/HADOOP-2006? 
page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
tabpanel#action_12548879 ]


Edward Yoon commented on HADOOP-2006:
-

I don't understand your comment.
Please more explanation for me.


Aggregate Functions in select statement
---

Key: HADOOP-2006
URL: https://issues.apache.org/jira/browse/ 
HADOOP-2006

Project: Hadoop
 Issue Type: Sub-task
 Components: contrib/hbase
   Affects Versions: 0.14.1
   Reporter: Edward Yoon
   Assignee: Edward Yoon
   Priority: Minor
Fix For: 0.16.0


Aggregation functions on collections of data values: average,  
minimum, maximum, sum, count.
Group rows by value of an columnfamily and apply aggregate  
function independently to each group of rows.

 * Grouping columnfamilies  ƒ ~function_list~ (Relation)
{code}
select producer, avg(year) from movieLog_table group by producer
{code}


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2362) [hbase] Leaking hdfs file handle

[hbase] Leaking hdfs file handle


 Key: HADOOP-2362
 URL: https://issues.apache.org/jira/browse/HADOOP-2362
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Priority: Minor
 Fix For: 0.16.0


Found a leaking filehandle researching HADOOP-2341.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2351) [Hbase Shell] If select command returns no result, it doesn't need to show the header information.

[
https://issues.apache.org/jira/browse/HADOOP-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548619
]

Hadoop QA commented on HADOOP-2351:
---

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12370992/2351.patch
against trunk revision r601232.

@author +1. The patch does not contain any @author tags.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new compiler warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests -1. The patch failed contrib unit tests.

Test results:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1269/testReport/
Findbugs warnings:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1269/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1269/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1269/console

This message is automatically generated.

[Hbase Shell] If select command returns no result, it doesn't need to show
the header information.
--

Key: HADOOP-2351
URL: https://issues.apache.org/jira/browse/HADOOP-2351
Project: Hadoop
Issue Type: Improvement
Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Edward Yoon
Assignee: Edward Yoon
Fix For: 0.16.0

Attachments: 2351.patch

{code}
hql select * from udanax;
+-+-+-+
| Row | Column | Cell
|
+-+-+-+
0 row(s) in set. (0.09 sec)
hql exit;
{code}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2350) hbase scanner api returns null row names, or skips row names if different column families do not have entries for some rows

2007-12-05 Thread Michael Bieniosek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Bieniosek updated HADOOP-2350:
--

Priority: Critical  (was: Major)

Bump priority because it is a correctness issue

 hbase scanner api returns null row names, or skips row names if different 
 column families do not have entries for some rows
 ---

 Key: HADOOP-2350
 URL: https://issues.apache.org/jira/browse/HADOOP-2350
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: Michael Bieniosek
Assignee: stack
Priority: Critical
 Fix For: 0.16.0

 Attachments: TestScannerAPI.java


 I'm attaching a test case that fails.
 I noticed that if I create a table with two column families, and start a 
 scanner on a row which only has an entry for one column family, the scanner 
 will skip ahead to the row name for which the other column family has an 
 entry.
 eg., if I insert rows so my table will look like this:
 {code}
 row - a:a - b:b
 aaa   a:1   nil
 bbb   a:2   b:2
 ccc   a:3   b:3
 {code}
 The scanner will tell me my table looks something like this:
 {code}
 row - a:a - b:b
 bbb   a:1   b:2
 bbb   a:2   b:3
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2355) Set region split size on table creation

2007-12-05 Thread Paul Saab (JIRA)

Set region split size on table creation
---

 Key: HADOOP-2355
 URL: https://issues.apache.org/jira/browse/HADOOP-2355
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Reporter: Paul Saab
Priority: Minor


Right now the region size before a split is determined by a global 
configuration.  It would be nice to configure tables independently of the 
global parameter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2363) Unit tests fail if there is another instance of Hadoop

Unit tests fail if there is another instance of Hadoop
--

 Key: HADOOP-2363
 URL: https://issues.apache.org/jira/browse/HADOOP-2363
 Project: Hadoop
  Issue Type: Bug
  Components: test
Reporter: Raghu Angadi
Assignee: Konstantin Shvachko


If you are running another Hadoop cluster or DFS, many unit tests fail because 
Namenode in MiniDFSCluster fails to bind to the right port. Most likely 
HADOOP-2185 forgot to set right defaults for MiniDFSCluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2329) [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data


[ 
https://issues.apache.org/jira/browse/HADOOP-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548881
 ] 

Edward Yoon commented on HADOOP-2329:
-

OK, i see jim.
But, i don't know the movements opposed to shell operations. :)

I think there can be no cause for complaint.
The shell tool isn't threatening a pure Hbase.

 [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing 
 and stroing data
 

 Key: HADOOP-2329
 URL: https://issues.apache.org/jira/browse/HADOOP-2329
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Edward Yoon
Assignee: Edward Yoon
Priority: Trivial
 Fix For: 0.16.0


 A built-in data type is a fundamental data type that the hbase shell defines.
 (character strings, scalars, ranges, arrays, ... , etc)
 If you need a specialized data type that is not currently provided as a 
 built-in type, 
 you are encouraged to write your own user-defined data type using UDC(not yet 
 implemented).
 (or contribute it for distribution in a future release of hbase shell)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Reprioritize HBase issues in JIRA

2007-12-05 Thread Bryan Duxbury

I don't mean to ruffle any feathers. I just want to make sure that  
the really critical issues are labeled as such.


In order to try and clarify what I think the priorities should mean,  
here's a wiki page I put together:
http://wiki.apache.org/lucene-hadoop/Hbase/ 
IssuePriorityGuidelines#preview


If I'm way off base on these categories, let me know.

-Bryan

On Dec 5, 2007, at 3:18 PM, edward yoon wrote:



I think this is a very subjective judgment of some application  
ability.


--

B. Regards,

Edward yoon @ NHN, corp.
Home : http://www.udanax.org



From: [EMAIL PROTECTED]
To: hadoop-dev@lucene.apache.org
Date: Wed, 5 Dec 2007 15:01:41 -0800
Subject: RE: Reprioritize HBase issues in JIRA

Yes, that would be a big help. Go for it! And thanks for the help.

---
Jim Kellerman, Senior Engineer; Powerset



-Original Message-
From: Bryan Duxbury [mailto:[EMAIL PROTECTED]
Sent: Wednesday, December 05, 2007 2:59 PM
To: hadoop-dev@lucene.apache.org
Subject: Reprioritize HBase issues in JIRA

Hey all,

It seems to me like there's a lot of mis-prioritized HBase
issues in the JIRA at the moment, since the default is Major.
I'd like to give it a once-over and reprioritize the tickets,
if no one objects. I think it would make our project easier
to assess at a glance.

-Bryan Duxbury





_
Your smile counts. The more smiles you share, the more we donate.   
Join in.

www.windowslive.com/smile?ocid=TXT_TAGLM_Wave2_oprsmilewlhmtagline

[jira] Updated: (HADOOP-2362) [hbase] Leaking hdfs file handle


 [ 
https://issues.apache.org/jira/browse/HADOOP-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HADOOP-2362:
--

Attachment: 2362.patch

HADOOP-2362 Leaking hdfs file handle
M  src/contrib/hbase/src/test/org/apache/hadoop/hbase/TestScanner2.java
HRegion.createHRegion API changed.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStoreFile.java
(obtainNewHStoreFile): Remove duplicated code.
(writeInfo): No need to wrap FSDataOutputStream in a DataOutputStream.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStore.java
 No need to wrap FSDataOutputStream in a DataOutputStream.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HRegionServer.java
Remove useless log.
Do explicit imports instead of importing whole packages.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMaster.java
HRegion.createHRegion API changed.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HServerInfo.java
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HServerAddress.java
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HAbstractScanner.java
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMsg.java
Do explicit imports instead of importing whole packages.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HRegion.java
Close daughter regions after opening them in split.
(createHRegion): No need of initialFiles argument.

 [hbase] Leaking hdfs file handle
 

 Key: HADOOP-2362
 URL: https://issues.apache.org/jira/browse/HADOOP-2362
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Priority: Minor
 Fix For: 0.16.0

 Attachments: 2362.patch


 Found a leaking filehandle researching HADOOP-2341.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2342) create a micro-benchmark for measure local-file versus hdfs read

[
https://issues.apache.org/jira/browse/HADOOP-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548887
]

Hadoop QA commented on HADOOP-2342:
---

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12371075/throughput.patch
against trunk revision r601518.

@author +1. The patch does not contain any @author tags.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new compiler warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests -1. The patch failed contrib unit tests.

Test results:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1276/testReport/
Findbugs warnings:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1276/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1276/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1276/console

This message is automatically generated.

create a micro-benchmark for measure local-file versus hdfs read

Key: HADOOP-2342
URL: https://issues.apache.org/jira/browse/HADOOP-2342
Project: Hadoop
Issue Type: Test
Components: dfs
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Fix For: 0.16.0

Attachments: throughput.patch

We should have a benchmark that measures reading a 10g file from hdfs and
from local disk.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1707) Remove the DFS Client disk-based cache


 [ 
https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HADOOP-1707:
-

Attachment: clientDiskBuffer11.patch

Merged with latest trunk. 

 Remove the DFS Client disk-based cache
 --

 Key: HADOOP-1707
 URL: https://issues.apache.org/jira/browse/HADOOP-1707
 Project: Hadoop
  Issue Type: Improvement
  Components: dfs
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.16.0

 Attachments: clientDiskBuffer.patch, clientDiskBuffer10.patch, 
 clientDiskBuffer11.patch, clientDiskBuffer2.patch, clientDiskBuffer6.patch, 
 clientDiskBuffer7.patch, clientDiskBuffer8.patch, clientDiskBuffer9.patch, 
 DataTransferProtocol.doc, DataTransferProtocol.html


 The DFS client currently uses a staging file on local disk to cache all 
 user-writes to a file. When the staging file accumulates 1 block worth of 
 data, its contents are flushed to a HDFS datanode. These operations occur 
 sequentially.
 A simple optimization of allowing the user to write to another staging file 
 while simultaneously uploading the contents of the first staging file to HDFS 
 will improve file-upload performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2362) [hbase] Leaking hdfs file handle


 [ 
https://issues.apache.org/jira/browse/HADOOP-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HADOOP-2362:
--

Status: Patch Available  (was: Open)

Tests pass locally.

 [hbase] Leaking hdfs file handle
 

 Key: HADOOP-2362
 URL: https://issues.apache.org/jira/browse/HADOOP-2362
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Priority: Minor
 Fix For: 0.16.0

 Attachments: 2362.patch


 Found a leaking filehandle researching HADOOP-2341.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2329) [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data

[
https://issues.apache.org/jira/browse/HADOOP-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_1254
]

Jim Kellerman commented on HADOOP-2329:
---

Edward,

But, i don't know the movements opposed to shell operations.

I don't think there is opposition to what you are doing, other than some people
feel that the advanced shell operations are not necessary in a basic shell that
can do simple queries and administrative functions. If the advanced features
could be packaged in a separate jar and loaded via some command line option, I
think it would gain higher acceptance.

I think there can be no cause for complaint.
The shell tool isn't threatening a pure Hbase.

I think I am misunderstanding something here. Are you proposing to do the data
types entirely outside of HBase or leveraging HADOOP-2197 ? Or do you want
internal support for data types?

If you are thinking of the former, that's fine. But I don't think support for
data types should be in the core of HBase.

[Hbase Shell] Addition of Built-In Value Data Types for efficient accessing
and stroing data

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2343) [hbase] Stuck regionserver?


 [ 
https://issues.apache.org/jira/browse/HADOOP-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Duxbury updated HADOOP-2343:
--

Priority: Major  (was: Minor)

Affects cluster stability, but cluster recovers on restart, so changing to 
major.

 [hbase] Stuck regionserver?
 ---

 Key: HADOOP-2343
 URL: https://issues.apache.org/jira/browse/HADOOP-2343
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Reporter: stack
Assignee: stack

 Looking in logs, a regionserver went down because it could not contact the 
 master after 60 seconds.  Watching logging, the HRS is repeatedly checking 
 all 150 loaded regions over and over again w/ a pause of about 5 seconds 
 between runs... then there is a suspicious 60+ second gap with no logging as 
 though the regionserver had hung up on something:
 {code}
 2007-12-03 13:14:54,178 DEBUG hbase.HRegionServer - flushing region 
 postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635
 2007-12-03 13:14:54,178 DEBUG hbase.HRegion - Not flushing cache for region 
 postlog,img151/60/plakatlepperduzy1hh7.jpg,1196614355635: snapshotMemcaches() 
 determined that there was nothing to do
 2007-12-03 13:14:54,205 DEBUG hbase.HRegionServer - flushing region 
 postlog,img247/230/seanpaul4li.jpg,1196615889965
 2007-12-03 13:14:54,205 DEBUG hbase.HRegion - Not flushing cache for region 
 postlog,img247/230/seanpaul4li.jpg,1196615889965: snapshotMemcaches() 
 determined that there was nothing to do
 2007-12-03 13:16:04,305 FATAL hbase.HRegionServer - unable to report to 
 master for 67467 milliseconds - aborting server
 2007-12-03 13:16:04,455 INFO  hbase.Leases - 
 regionserver/0:0:0:0:0:0:0:0:60020 closing leases
 2007-12-03 13:16:04,455 INFO  hbase.Leases$LeaseMonitor - 
 regionserver/0:0:0:0:0:0:0:0:60020.leaseChecker exiting
 {code}
 Master seems to be running fine scanning its ~700 regions.  Then you see this 
 in log, before the HRS shuts itself down.
 {code}
 2007-12-03 13:14:31,416 INFO  hbase.Leases - HMaster.leaseChecker lease 
 expired 153260899/1532608992007-12-03 13:14:31,417 INFO  hbase.HMaster - 
 XX.XX.XX.102:60020 lease expired
 {code}
 ... and we go on to process shutdown.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-2364) when hbase regionserver restarts, it says impossible state for createLease()

2007-12-05 Thread Michael Bieniosek (JIRA)

when hbase regionserver restarts, it says impossible state for createLease()
--

 Key: HADOOP-2364
 URL: https://issues.apache.org/jira/browse/HADOOP-2364
 Project: Hadoop
  Issue Type: Bug
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Michael Bieniosek
Priority: Minor


I restarted a regionserver, and got this error in its logs:

org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
java.lang.AssertionError: Impossible state for createLease(): Lease 
-435227488/-435227488 is still held.
at org.apache.hadoop.hbase.Leases.createLease(Leases.java:145)
at org.apache.hadoop.hbase.HMaster.regionServerStartup(HMaster.java:1278
)
at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

at org.apache.hadoop.ipc.Client.call(Client.java:482)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184)
at $Proxy0.regionServerStartup(Unknown Source)
at org.apache.hadoop.hbase.HRegionServer.reportForDuty(HRegionServer.jav
a:1025)
at org.apache.hadoop.hbase.HRegionServer.run(HRegionServer.java:659)
at java.lang.Thread.run(Unknown Source)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2329) [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data


[ 
https://issues.apache.org/jira/browse/HADOOP-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548906
 ] 

Jim Kellerman commented on HADOOP-2329:
---

Since you are proposing the former rather than the latter, I would say go for 
it.

 [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing 
 and stroing data
 

 Key: HADOOP-2329
 URL: https://issues.apache.org/jira/browse/HADOOP-2329
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Edward Yoon
Assignee: Edward Yoon
Priority: Trivial
 Fix For: 0.16.0


 A built-in data type is a fundamental data type that the hbase shell defines.
 (character strings, scalars, ranges, arrays, ... , etc)
 If you need a specialized data type that is not currently provided as a 
 built-in type, 
 you are encouraged to write your own user-defined data type using UDC(not yet 
 implemented).
 (or contribute it for distribution in a future release of hbase shell)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2362) [hbase] Leaking hdfs file handle

[
https://issues.apache.org/jira/browse/HADOOP-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548907
]

Hadoop QA commented on HADOOP-2362:
---

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12371086/2362.patch
against trunk revision r601518.

@author +1. The patch does not contain any @author tags.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new compiler warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests -1. The patch failed contrib unit tests.

Test results:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1277/testReport/
Findbugs warnings:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1277/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1277/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1277/console

This message is automatically generated.

[hbase] Leaking hdfs file handle

Key: HADOOP-2362
URL: https://issues.apache.org/jira/browse/HADOOP-2362
Project: Hadoop
Issue Type: Bug
Components: contrib/hbase
Reporter: stack
Priority: Minor
Fix For: 0.16.0

Attachments: 2362.patch

Found a leaking filehandle researching HADOOP-2341.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2329) [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing and stroing data


[ 
https://issues.apache.org/jira/browse/HADOOP-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548908
 ] 

Edward Yoon commented on HADOOP-2329:
-

Thanks for your advice.

 [Hbase Shell] Addition of Built-In Value Data Types for efficient accessing 
 and stroing data
 

 Key: HADOOP-2329
 URL: https://issues.apache.org/jira/browse/HADOOP-2329
 Project: Hadoop
  Issue Type: New Feature
  Components: contrib/hbase
Affects Versions: 0.16.0
Reporter: Edward Yoon
Assignee: Edward Yoon
Priority: Trivial
 Fix For: 0.16.0


 A built-in data type is a fundamental data type that the hbase shell defines.
 (character strings, scalars, ranges, arrays, ... , etc)
 If you need a specialized data type that is not currently provided as a 
 built-in type, 
 you are encouraged to write your own user-defined data type using UDC(not yet 
 implemented).
 (or contribute it for distribution in a future release of hbase shell)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1707) Remove the DFS Client disk-based cache

2007-12-05 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548911
 ] 

Konstantin Shvachko commented on HADOOP-1707:
-

I think this patch has been tested quite thoroughly, and I don't see any 
algorithmic flaws in it. 
The logic is fairly complicated though, so imo
# we need better documentation either in JavaDoc or at least in Jira. 
# it would be good if you could extract common actions for the client and the 
data-node into 
separate classes, not inner ones.

===   DFSClient.java
- DFSClient: 4 unused variables, members.
- DFSOutputStream.lb should be local variable.
- processDatanodeError() and DFSOutputStream.close() have common code.
- BlockReader.readChunk()
{code}
07/12/04 18:36:22 INFO fs.FSInputChecker: DFSClient readChunk got seqno 14 
offsetInBlock 7168
{code}
Should be DEBUG.
- More comments: What is e.g. dataQueue, ackQueue, bytesCurBlock?
- Some new members in DFSOutputStream can be calculated from the other. 
No need to store them all. See e.g.
{code}
private int packetSize = 0;
private int chunksPerPacket = 0;
private int chunksPerBlock = 0;
private int chunkSize = 0;
{code}
- In the line below 8 should be defined as a constant. Otherwise, the meaning 
of that is not clear.
{code}
  chunkSize = bytesPerChecksum + 8; // user data + checksum
{code}
- currentPacket should be a local variable of writeChunk()
- The 4 in the code snippet below looks misterious:
{code}
  if (len + cklen + 4  chunkSize) {
{code}
- why start ResponseProcessor in processDatanodeError()
- some methods should be moved into new inner classes, like 
nextBlockOutputStream() should be a part of DataStreamer
- Packet should be factored out to a separate class (named probably DataPacket).
  It should have serialization/deserialization methods for packet header, which 
should 
  be reused in DFSClient and DataNodes for consistency in data transfer.
  It also should have methods readPacker() and writePacket()

===   DataNode.java
- import org.apache.hadoop.io.Text; is redundant.
- My Eclipse shows 5 variables that are never read.
- Rather than using 4 on several occasions a constant should be defined
{code}
SIZE_OF_INTEGER = Integer.SIZE / Byte.SIZE;
{code}
and used whenever required.
- lastDataNodeRun() should not be public

===   FSDataset.java
- writeToBlock(): These are two searches in a map instead of one.
{code}
  if (ongoingCreates.containsKey(b)) {
ActiveFile activeFile = ongoingCreates.get(b);
{code}
- unfinalizeBlock() I kinda find the name funny.

===   General
- Convert comments like   // ..  to JavaDoc   /**  ...  */ style 
comments 
  when used as method or class headers even if they are private.
- Formatting. Tabs should be replaced by 2 spaces. Eg: ResponseProcessor.run(), 
DataStreamer.run()
- Formatting. Long lines.


 Remove the DFS Client disk-based cache
 --

 Key: HADOOP-1707
 URL: https://issues.apache.org/jira/browse/HADOOP-1707
 Project: Hadoop
  Issue Type: Improvement
  Components: dfs
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.16.0

 Attachments: clientDiskBuffer.patch, clientDiskBuffer10.patch, 
 clientDiskBuffer11.patch, clientDiskBuffer2.patch, clientDiskBuffer6.patch, 
 clientDiskBuffer7.patch, clientDiskBuffer8.patch, clientDiskBuffer9.patch, 
 DataTransferProtocol.doc, DataTransferProtocol.html


 The DFS client currently uses a staging file on local disk to cache all 
 user-writes to a file. When the staging file accumulates 1 block worth of 
 data, its contents are flushed to a HDFS datanode. These operations occur 
 sequentially.
 A simple optimization of allowing the user to write to another staging file 
 while simultaneously uploading the contents of the first staging file to HDFS 
 will improve file-upload performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: [jira] Commented: (HADOOP-2006) Aggregate Functions in select statement

2007-12-05 Thread edward yoon

Sorry for my mistake...

If you mistake that standard sql is all of A-DBMS capacity,
I think you don't want to studies about database structure, access
algorithms, philosophies,.., etc of A-DBMS.

Then, Can i make you use the A-DBMS's 100% Full capacity by force?

Let's assume the A-DBMS didn't provide standard sql.
Are you want to use the A-DBMS?

Do you want to use the A-DBMS?

Ok..
If you want to use the A-DBMS, you already didn't thought the sql isn't all
of A-DBMS.

If you want to use the A-DBMS, you already thought the sql isn't all of
A-DBMS.

So, conclusion?
The more affluent the hbase shell, the use of hbase will be growing very
rapidly.

B. Regards,

Edward yoon @ NHN, corp.
Home : http://www.udanax.org

From: [EMAIL PROTECTED]
To: hadoop-dev@lucene.apache.org
Subject: RE: [jira] Commented: (HADOOP-2006) Aggregate Functions in select
statement
Date: Thu, 6 Dec 2007 01:10:14 +

it will encourage people to think that the shell is a good way to interact
with HBase in general...

I think this is a key point. :)

The Hbase Shell aim is to improve the work's efficiency, without research of
specified knowledge.
I'll makes an accessory for database access methods on Hbase.
Also, i'm thinking about Matrix operations on Hbase.

But, ... Hbase Shell just a one of applications on Hbase.

...
Let's think.

If you mistake that standard sql is all of A-DBMS capacity,
I think you don't want to studies about database structure, access
algorithms, philosophies,.., etc of A-DBMS.

Then, Can i make you use the A-DBMS's 100% Full capacity by force?

Let's assume the A-DBMS didn't provide standard sql.
Are you want to use the A-DBMS?

Ok..
If you want to use the A-DBMS, you already didn't thought the sql isn't all
of A-DBMS.

So, conclusion?
The more affluent the hbase shell, the use of hbase will be growing very
rapidly.

B. Regards,

Edward yoon @ NHN, corp.
Home : http://www.udanax.org

From: [EMAIL PROTECTED]
Subject: Re: [jira] Commented: (HADOOP-2006) Aggregate Functions in select
statement
Date: Wed, 5 Dec 2007 15:50:50 -0800
To: hadoop-dev@lucene.apache.org

If you have a table with something like a billion rows, and do an
aggregate function on the table from the shell, you will end up
reading all billion rows through a single machine, essentially
aggregating the entire dataset locally. This defeats the purpose of
having a massively distributed database like HBase. To do this more
efficiently, you'd ideally kick of a Map Reduce job that can perform
the various aggregation function on the dataset in parallel,
harnessing the power of the distributed dataset, and then returning
the results to a central location once they are calculated.

I think putting this option into the shell is risky, because it will
encourage people to think that the shell is a good way to interact
with HBase in general, which it isn't. We want people to understand
HBase is best consumed in parallel and discourage solutions that
aggregate access through a single point. As such, we shouldn't build
features that allow people to inadvertently use the wrong access
patterns.

On Dec 5, 2007, at 3:38 PM, Edward Yoon (JIRA) wrote:

[ https://issues.apache.org/jira/browse/HADOOP-2006?
page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
tabpanel#action_12548879 ]

Edward Yoon commented on HADOOP-2006:
-

I don't understand your comment.
Please more explanation for me.

Aggregate Functions in select statement
---

Key: HADOOP-2006
URL: https://issues.apache.org/jira/browse/
HADOOP-2006
Project: Hadoop
Issue Type: Sub-task
Components: contrib/hbase
Affects Versions: 0.14.1
Reporter: Edward Yoon
Assignee: Edward Yoon
Priority: Minor
Fix For: 0.16.0

Aggregation functions on collections of data values: average,
minimum, maximum, sum, count.
Group rows by value of an columnfamily and apply aggregate
function independently to each group of rows.
* ƒ ~function_list~ (Relation)
{code}
select producer, avg(year) from movieLog_table group by producer
{code}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

_
Put your friends on the big screen with Windows Vista® + Windows Live™.
http://www.microsoft.com/windows/shop/specialoffers.mspx?ocid=TXT_TAGLM_CPC_MediaCtr_bigscreen_102007

_
You keep typing, we keep giving. Download Messenger and join the i’m Initiative
now.
http://im.live.com/messenger/im/home/?source=TAGLM

[jira] Assigned: (HADOOP-1550) [hbase] No means of deleting a'row' nor all members of a column family


 [ 
https://issues.apache.org/jira/browse/HADOOP-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Duxbury reassigned HADOOP-1550:
-

Assignee: Bryan Duxbury

 [hbase] No means of deleting a'row' nor all members of a column family
 --

 Key: HADOOP-1550
 URL: https://issues.apache.org/jira/browse/HADOOP-1550
 Project: Hadoop
  Issue Type: Improvement
  Components: contrib/hbase
Reporter: stack
Assignee: Bryan Duxbury

 There is no support in hbase currently for deleting a row -- i.e. remove all 
 columns and their versions keyed by a particular row id.  Nor is there a 
 means of passing in a row id and column family name having hbase delete all 
 members of the column family (for the designated row).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Work started: (HADOOP-1550) [hbase] No means of deleting a'row' nor all members of a column family