[jira] [Commented] (ACCUMULO-1124) optimize index size in RFile

2016-05-23 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297642#comment-15297642
 ] 

Josh Elser commented on ACCUMULO-1124:
--

bq. I experimented with shortening keys in the index and that gave some nice 
improvements, but not as much as I expected. I realized that even with those 
changes, bad keys were still being placed in the index. I added code to keep 
statistics on key sizes and used those statistics to try to select keys that 
were <=AVG(keySize). I also excluded keys that were too big (greater than 3 std 
dev from the mean).

I had the thought "how would we determine when index size is efficient" in the 
future (both evaluating the success of this change as well as identifying perf 
issues in the future). Did you give any thought about how we could expose this 
information more easily? Maybe we include some extra information in the file 
entry in metadata so that the master/monitor could easily aggregate/report on 
file statistics? Not suggesting it needs to happen now, but wondering your 
thoughts (since I assume you were doing all this investigation by hand).

> optimize index size in RFile
> 
>
> Key: ACCUMULO-1124
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1124
> Project: Accumulo
>  Issue Type: Improvement
>Reporter: Eric Newton
>Assignee: Keith Turner
> Fix For: 1.8.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I noticed HBASE-7845 and it seems like something we could do in RFile, too.
> Instead of putting the whole key in the index, you put in enough of the key 
> to get the reader to the beginning of the block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Accumulo-1.8-Integration-Tests - Build # 968 - Unstable! -- 1.6

2016-05-23 Thread elserj
Accumulo-1.8-Integration-Tests - Build # 968 - Unstable:

Check console output at 
https://secure.penguinsinabox.com/jenkins/job/Accumulo-1.8-Integration-Tests/968/
 to view the results.

Accumulo-Pull-Requests - Build # 288 - Aborted

2016-05-23 Thread Apache Jenkins Server
The Apache Jenkins build system has built Accumulo-Pull-Requests (build #288)

Status: Aborted

Check console output at 
https://builds.apache.org/job/Accumulo-Pull-Requests/288/ to view the results.

[jira] [Created] (ACCUMULO-4314) Use statistics to choose better keys for RFile index

2016-05-23 Thread Keith Turner (JIRA)
Keith Turner created ACCUMULO-4314:
--

 Summary: Use statistics to choose better keys for RFile index
 Key: ACCUMULO-4314
 URL: https://issues.apache.org/jira/browse/ACCUMULO-4314
 Project: Accumulo
  Issue Type: Improvement
Reporter: Keith Turner
Assignee: Keith Turner
 Fix For: 1.6.6, 1.7.2


The commit for ACCUMULO-1124 makes two changes :
  * Generates shorter keys that may not exist in data to place in RFile index
  * Use statistics to make better choices about what keys to place in index.  
These changes look for keys that are average or below and excludes large keys 
(keys that are > 3 std dev).

The change to generate shorter keys can not be made in 1.7.X and 1.6.X because 
it would generate RFiles that may not work properly with older 1.6 and 1.7 
versions.   However the changes to use statistics to pick better keys could be 
made in 1.6 and 1.7. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-1124) optimize index size in RFile

2016-05-23 Thread Keith Turner (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297313#comment-15297313
 ] 

Keith Turner commented on ACCUMULO-1124:


I experimented with shortening keys in the index and that gave some nice 
improvements, but not as much as I expected.  I realized that even with those 
changes, bad keys were still being placed in the index.  I added code to keep 
statistics on key sizes and used those statistics to try to select keys that 
were <=AVG(keySize).  I also excluded keys that were too big (greater than 3 
std dev from the mean).  With the key shortening and statistics changes I was 
able to reduce the index size for the file in my previous comment to that below.

{noformat}
RFile Version: 8

Locality group   : 
Num   blocks   : 21,758
Index level 1  : 3,048 bytes  1 blocks
Index level 0  : 1,873,885 bytes  8 blocks
First key  : um:d:385:%03;%01;10.30.170.244>>o>/2954%af; 
data:current [] 4611686019157309597 false
Last key   : um:d:395:%03;%01;%ff; 
com.facebook>.www>s>/dialog/feed?app_id=90376669494... TRUNCATED data:current 
[] -6917529026891043602 false
Num entries: 24,299,468
Column families: [data]

Meta block : BCFile.index
  Raw size : 4 bytes
  Compressed size  : 12 bytes
  Compression type : gz

Meta block : RFile.index
  Raw size : 3,163 bytes
  Compressed size  : 1,515 bytes
  Compression type : gz
{noformat}

At first I thought I could make these changes in 1.6 and 1.7.  However while 
working on this I realized the key shortening change is breaking change, in 
that older RFile code would not be able to handle keys in the index that do not 
exist in the data.   The changes to uses statistics to choose better keys could 
be made in 1.6 and 1.7.

> optimize index size in RFile
> 
>
> Key: ACCUMULO-1124
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1124
> Project: Accumulo
>  Issue Type: Improvement
>Reporter: Eric Newton
>Assignee: Keith Turner
> Fix For: 1.8.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I noticed HBASE-7845 and it seems like something we could do in RFile, too.
> Instead of putting the whole key in the index, you put in enough of the key 
> to get the reader to the beginning of the block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Accumulo-1.7 - Build # 233 - Aborted

2016-05-23 Thread Apache Jenkins Server
The Apache Jenkins build system has built Accumulo-1.7 (build #233)

Status: Aborted

Check console output at https://builds.apache.org/job/Accumulo-1.7/233/ to view 
the results.

[jira] [Created] (ACCUMULO-4313) Improve Accumulo website

2016-05-23 Thread Mike Walch (JIRA)
Mike Walch created ACCUMULO-4313:


 Summary: Improve Accumulo website
 Key: ACCUMULO-4313
 URL: https://issues.apache.org/jira/browse/ACCUMULO-4313
 Project: Accumulo
  Issue Type: Improvement
Reporter: Mike Walch
Assignee: Mike Walch
Priority: Minor


Some issues:
* Page width is not restricted. 
* Accumulo logo is not used in navbar.
* Nav bar links need to be organized better
* Home page to very verbose and could be simplified
* Footer has too much wording/legalese. 
* ASF links need to exists on website but could put in their own section

These issues are all aesthetic/subjective so feel free to comment or disagree.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Accumulo-Master - Build # 1866 - Aborted

2016-05-23 Thread Apache Jenkins Server
The Apache Jenkins build system has built Accumulo-Master (build #1866)

Status: Aborted

Check console output at https://builds.apache.org/job/Accumulo-Master/1866/ to 
view the results.

Accumulo-1.6 - Build # 984 - Fixed

2016-05-23 Thread Apache Jenkins Server
The Apache Jenkins build system has built Accumulo-1.6 (build #984)

Status: Fixed

Check console output at https://builds.apache.org/job/Accumulo-1.6/984/ to view 
the results.

Accumulo-1.8 - Build # 13 - Aborted

2016-05-23 Thread Apache Jenkins Server
The Apache Jenkins build system has built Accumulo-1.8 (build #13)

Status: Aborted

Check console output at https://builds.apache.org/job/Accumulo-1.8/13/ to view 
the results.

[jira] [Commented] (ACCUMULO-4164) Avoid copy of RFile Index blocks when in cache

2016-05-23 Thread Adam Fuchs (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296890#comment-15296890
 ] 

Adam Fuchs commented on ACCUMULO-4164:
--

I would love to see the perf test results for this change. Can you post them, 
[~kturner]?

> Avoid copy of RFile Index blocks when in cache
> --
>
> Key: ACCUMULO-4164
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4164
> Project: Accumulo
>  Issue Type: Improvement
>Affects Versions: 1.6.5, 1.7.1
>Reporter: Keith Turner
>Assignee: Keith Turner
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I have been doing performance experiments with RFile.  During the course of 
> these experiments I noticed that RFile is not as fast at it should be in the 
> case where index blocks are in cache and the RFile is not already open.  The 
> reason is that the RFile code copies and deserializes the index data even 
> though its already in memory.
> I made the following change to RFile in a branch.
>  * Avoid copy of index data when its in cache
>  * Deserialize offsets lazily (instead of upfront) during binary search
>  * Stopped calling lots of synchronized methods during deserialization of 
> index info.  The existing code use ByteArrayInputStream which results in lots 
> of fine grained synchronization.  Switching to an inputstream that offers the 
> same functionality w/o sync showed a measurable performance difference.  
> These changes lead to performance in the following two situations  :
>  * When an RFiles data is in cache, but its not open on the tserver.  
>  * For RFiles with multilevel indexes with index data in cache.   Currently 
> an open RFile only keeps the root node in memory.   Lower level index nodes 
> are always read from the cache or DFS.   The changes I made would always 
> avoid the copy and deserialization of lower level index nodes when in cache.
> I have seen significant performance improvements testing with the two cases 
> above.  My test are currently based on a new API I am creating for RFile, so 
> I can not easily share them until I get that pushed.  
> For the case where a tserver has all files frequently in use already open and 
> those files have a single level index, these changes should not make a 
> significant performance difference.
> These change should result in less memory use for opening the same rfile 
> multiple times for different scans (when data is in cache).  In this case all 
> of the RFiles would share the same byte array holding the serialized index 
> data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3470) Upgrade to Commons VFS 2.1

2016-05-23 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296813#comment-15296813
 ] 

Dave Marion commented on ACCUMULO-3470:
---

I removed ReadOnlyHdfsFileProviderTest in 1.7 and beyond. I think my work is 
done here.

> Upgrade to Commons VFS 2.1
> --
>
> Key: ACCUMULO-3470
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3470
> Project: Accumulo
>  Issue Type: Task
>Reporter: Dave Marion
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0, 2.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Commons VFS 2.1 is nearing release. When released we need to remove the VFS 
> related classes in the start module, update the imports, and update the 
> version in the pom. Will set fixVersions when VFS is released.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3470) Upgrade to Commons VFS 2.1

2016-05-23 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296791#comment-15296791
 ] 

Dave Marion commented on ACCUMULO-3470:
---

Ok, I reverted the commit for updating Commons VFS from 2.0 to 2.1 in the 
Accumulo 1.6 branch. I merged that change up to 1.7, reverted the revert 
commit, and merged that all the way up to master.

> Upgrade to Commons VFS 2.1
> --
>
> Key: ACCUMULO-3470
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3470
> Project: Accumulo
>  Issue Type: Task
>Reporter: Dave Marion
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0, 2.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Commons VFS 2.1 is nearing release. When released we need to remove the VFS 
> related classes in the start module, update the imports, and update the 
> version in the pom. Will set fixVersions when VFS is released.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Accumulo-1.8-Integration-Tests - Build # 967 - Failure! -- master

2016-05-23 Thread elserj
Accumulo-1.8-Integration-Tests - Build # 967 - Failure:

Check console output at 
https://secure.penguinsinabox.com/jenkins/job/Accumulo-1.8-Integration-Tests/967/
 to view the results.