Hi,
Kammi’s summary is very comprehensive, try to open source first. and
you'd better find an experienced mentor to help you, it will be very
helpful ! Good luck
Best Regards
---
DolphinScheduler(Incubator) PPMC
Lidong Dai
dailidon...@gmail.com
---
On Sun, Feb 28, 2021 at 6:52 PM Furkan KAMACI
wrote:
> Hi,
>
> Actually you have a detailed documentation which explains which approach
> you have compared to similar systems and performance metrics of following
> them i.e. reducing storage 10 to the 100 times or having low latency
> queries.
>
> My advices are (some of them are same with Sheng's and Liang's ):
>
> 1) Find an experienced mentor to guide you.
>
> 2) Start to translate your documentation to English.
>
> 3) Open source your project. How can we have a comment on your project if
> we cannot see anything about it?
>
> 4) Gain contributors to your project. At least you should show your
> intention to have committers/contributors out of your company. Eliminate
> the risk of being non-meritocratic management of the project.
>
> 5) Structure your proposal. Explain why people need this project, which
> problems do current projects have and how you managed to handle them. We
> should understand is it a bundle of other projects, a completely new
> project, or a wrapper of other projects which eliminates the shortcomings
> of them.
>
> 6) Find a suitable name for your project in order to not try to solve
> trademark problems that may lose your time if you enter the incubation.
>
> Kind Regards,
> Furkan KAMACI
>
>
> On Sun, Feb 28, 2021 at 1:02 PM Liang Chen
> wrote:
>
> > Hi
> >
> > It would be better if you could find an experienced IPMC member to help
> you
> > for preparing the proposal.
> > Based on Sheng Wu input, i have one more comment : can you please explain
> > what are the different with other similar data analysis DB? you can
> > consider explaining from use cases perspective.
> >
> > Regards
> > Liang
> >
> >
> > fp wrote
> > > Dear Apache Incubator Community,
> > >
> > >
> > > Please accept the following proposal for presentation and discussion:
> > > https://github.com/lucene-cn/lxdb/wiki
> > >
> > >
> > > LXDB is a high-performance,OLAP,full text search database.it`s base on
> > > hbase,but replaced hfile with lucene index to support more effective
> > > secondary indexes,it`s also base on spark sql,so that you can used sql
> > api
> > > to visit data and do olap calculate. and also the lucene index is store
> > on
> > > hdfs (not local disk).
> > >
> > >
> > > In our Production System, LXDB supported 200+ clusters,some of the
> single
> > > cluster is 1000+ nodes,insert 200 billion rows per day ( 2
> > > billion rows for total), one of the biggest single table has 200million
> > > lucene index on LXDB.
> > >
> > >
> > > Hadoop`s father Doug Cutting cut nutch into HBase, MapReduce (hive),
> > HDFS,
> > > Lucene.We have merged these separated projects again,LXDB equals
> > > spark sql+hbase+lucene+parquet+hdfs,it is a super database.It took me
> 10
> > > years to complete these merging operations.But the purpose is no
> longer a
> > > search engine, but a database.
> > >
> > >
> > >
> > >
> > >
> > > Best regards
> > > yannian mu
> > >
> > >
> > >
> > >
> > > LXDB Proposal
> > > == Abstract ==
> > > LXDB is a high-performance,OLAP,full text search database.
> > >
> > >
> > > === it`s base on hbase,but replaced hfile with lucene index to support
> > > more effective secondary indexes.===
> > > we modify hbase region server ,we change hfile to lucene,when put
> > > data we put document to lucene instande of put data to
> hfile
> > > lucene index store on region server (it is not sote in
> > > different cluster like elstice search+hbase ,it takes to copy of data)
> > >
> > >
> > > === it`s base on spark sql for olap===
> > > we Integrated spark and hbase together ,it`s useage like this ,
> > > 1.unpackage lxdb.tar.gz
> > > 2.config hadoop_config path,
> > > 3.run start-all.sh to start cluster.
> > > lxdb can startup spark through hadoop yarn ,and then spark executor
> > > process Embedded start hbase region server service .
> > >
> > >
> > > you can operate lxdb database throuth spark sql api(hive) or mysql api.
> > > 1.the sql used spark rdd+hbase scaner to visit hbase .
> > > 2.the sql`s condition (filter or group by agg) will predicate to hbase
> ,
> > > 3.hbase used lucene index to filter data in region server.
> > > all of the spark,hbase,lucene is Embedded Integrated together,it is
> > > not a seperate cluster ,that is the different with solr/es
> +
> > > hbase+spark Solution.
> > >
> > >
> > > == Background ==
> > > === Multiple copies of data ===
> > > Apache HBase+Elastic Search is the most popular Solution on full text
> > > search ,but it`s weak on Online AnalyticalProcessing.
> > > so most of the time the Production System used spark(or hive or impala
> or
> > > presto) ,hbase,solr/es at the same time.Multiple copies of data are
>