Andrew Musselman created MAHOUT-2142:
----------------------------------------

             Summary: Discussion and planning epic for adding blockchain data 
sources and analytics use cases
                 Key: MAHOUT-2142
                 URL: https://issues.apache.org/jira/browse/MAHOUT-2142
             Project: Mahout
          Issue Type: Epic
            Reporter: Andrew Musselman
            Assignee: Andrew Musselman


*About*

Proposal is to provide a new data source, namely any number of 
ethereum-compatible ledgers, and pick a few compelling use cases to build out 
this year.

We will add children to this epic for specific work items.

*Example Use Cases*
 # Search-indexes of given ledgers
 # Computed similarity to other accounts on the same ledger based on activity 
history
 # Time-series analysis of gas (transaction) fees across multiple ledgers
 # Time-series analysis of transactions (overall # per week/month/year/custom 
period, by user account etc.) for a list of ledgers. (Comparative analysis of 
usage)
 # Max/Min range of transactions for different ledgers

 
*How to Get Started*
To explore ledger operations and data, get a copy of go-ethereum (geth: 
[https://geth.ethereum.org/docs/install-and-build/installing-geth]) and run it 
against a network to get all historical records. The Goerli test network's 
entire three years of data is only 32GB, so there are small enough data sets to 
play with, and the data files are stored on your local disk by default at 
~/ethereum.
 
There are libraries that interact live with any given ledger including Web3JS 
([https://web3js.readthedocs.io/en/v1.5.2/]) and Web3.py 
([https://web3py.readthedocs.io/en/stable/]), so reading out of ledgers is 
simple.
 
Reading and indexing the actual data might mean writing custom parsers for 
Mahout and Lucene, and possibly getting into decompiling bytecode back into 
readable Solidity code, so there are pieces we would need to plan out.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to