Re: Solr for noSQL

2011-01-31 Thread Steven Noels
On Fri, Jan 28, 2011 at 1:30 AM, Jianbin Dai  wrote:

> Hi,
>
>
>
> Do we have data import handler to fast read in data from noSQL database,
> specifically, MongoDB I am thinking to use?
>
> Or a more general question, how does Solr work with noSQL database?
>


Can't say anything about MongoDB, but we have an integration of SOLR with
HBase inside Lily - www.lilyproject.org. It indeed uses the 'normal' SOLR
index update API rather than a DIH - as we had the need to have incremental
updates. The Indexer component we wrote does mapping from Lily/HBase schema
to SOLR, as we also felt the need that both schemas shouldn't necessarily be
identical.

Steven.
-- 
Steven Noels
http://outerthought.org/
Scalable Smart Data
Makers of Kauri, Daisy CMS and Lily


Re: Solr for noSQL

2011-01-31 Thread Steven Noels
On Mon, Jan 31, 2011 at 9:38 PM, Upayavira  wrote:

>
>
> On Mon, 31 Jan 2011 08:40 -0500, "Estrada Groups"
>  wrote:
> > What are the advantages of using something like HBase over your standard
> > Lucene index with Solr? It would seem to me like you'd be losing a lot of
> > what Lucene has to offer!?!
>
> I think Steven is saying that he has an indexer app that reads from
> HBase and writes to a standard Solr by hitting its Rest API.
>
> So, nothing funky, just a little app that reads from HBase and posts to
> Solr.
>


We're doing something like offering a relational-database-like experience
(i.e. a schema language, storing typed data instead of byte[]s, secondary
indexing facilities), with some content management features (versioning,
blob storage), combined with SOLR as a search index (with mapping between
our schema and that of SOLR), the index being maintained incrementally and
through map/reduce (for reindexing). We keep multiple versions of the index
if you want, with state management and we do text extraction with Tika. All
this happens fully distributed, so you can play with different boxes serving
as HBase datanode, or index feeder, SOLR search node, etc etc.

All that sits behind a Java API that uses Avro underneath, and a REST
interface as well (searches go directly to SOLR). For future versions, we
will integrate a recommendation engine and some analytics tools as well.

So yes, we do more (or rather: different things) than what Lucene/SOLR does,
as we offer a full-featured data storage environment, stuffing your data in
HBase (which scales better than MySQL), and make it searchable through SOLR.

The 'funky app' you're referring at now sits at about 3 manyears of fulltime
development, BTW. ;-)

Steven.
-- 
Steven Noels
http://outerthought.org/
Scalable Smart Data
Makers of Kauri, Daisy CMS and Lily


Re: Solr for noSQL

2011-02-01 Thread Steven Noels
On Tue, Feb 1, 2011 at 11:52 AM, Upayavira  wrote:


>
> Apologies if my "nothing funky" sounded like you weren't doing cool
> stuff.


No offense whatsoever. I think my longer reply paints a more accurate light
on what Lily means in terms of "SOLR for NoSQL", and it was your reaction
who triggered this additional explanation.


> I was merely attempting to say that I very much doubt you were
> doing anything funky like putting HBase underneath Solr as a replacement
> of FSDirectory.


There are some initiatives in the context of Cassandra IIRC, as well as a
project which stores Lucene index files in HBase tables, but frankly they
seem more experimentation, and also I think the nature of how Lucene/SOLR
works + what HBase does on top of Hadoop FS somehow is in conflict with each
other. Too many layers of indirection will kill performance on every layer.



> I was trying to imply that, likely your integration with
> Solr was relatively conventional (interacting with its REST interface),
>


Yep. We figured that was the wiser road to walk, and leaves a clear-defined
interface and possible area of improvement against a too-low level of
integration.


> and the "funky" stuff that you are doing sits outside of that space.
>
> Hope that's a clearer (and more accurate?) attempt at what I was trying
> to say.
>
> Upayavira (who finds the Lily project interesting, and would love to
> find the time to play with it)
>

Anytime, Upayavira. Anytime! ;-)

Steven.
-- 
Steven Noels
http://outerthought.org/
Scalable Smart Data
Makers of Kauri, Daisy CMS and Lily


Lily 0.3 is released

2011-02-14 Thread Steven Noels
s on IOExceptions, this allows
  operations to survive node failures.
  - Automatic balancing over all Lily nodes. Each method called on the
  Repository object will automatically be performed on an
arbitrarily selected
  Lily node.
  - Avro: switch from HTTP to Netty transport. For this, upgraded to an
  Avro 1.5 snapshot with patch AVRO-747.
   - Tester tool
  - Allows to configure test scenarios and indexer and solr
  configuration.
  - Has extended logging, metrics and metrics plotting (gnuplot
  integration) capabilities allowing for performance evaluations.
  - Introduces general performance testing library.
   - Lily server process
  - Ability to create tables with multiple initial regions at first
  cluster startup (record table, linkindex, blobincubator, ...).
Also allows
  to set the max file size and the memstore flush size.
  - The initial Lily startup can now be performed on multiple nodes
  concurrently, previously this failed because the table creation
code did not
  handle failures in case of concurrent table creation.
  - Configuration files changed so that they allow for inheritance (=
  fallback from one conf dir to another, to the built-in conf). Include
  default configuration in Kauri-module jars. All this will help in
  maintaining Lily configuration across Lily versions.

We hope you'll enjoy this new Lily as much as we did making it. Let us know
how we're doing!

The Outerthought Lily team.
--
Steven Noels
http://outerthought.org/
Scalable Smart Data
Makers of Kauri, Daisy CMS and Lily


[ann] Lily 1.0 is out: Smart Data at Scale, made Easy!

2011-05-05 Thread Steven Noels
Hi all,

We’re really proud to release the first official major release of Lily
- our flagship repository for scalable data and content management,
after 18 months of intense engineering work. We’re thrilled being
first to launch the first open source, general-purpose,
highly-scalable yet flexible data repository based on NOSQL/BigData
technology: read all about it below.

>What

Lily is a data and content repository made for the Age of Data: it
allows you to store and manage vast amounts of data, and in the future
will allow you to monetize user interactions by tracking and analyzing
audience data.

Lily makes Big Data easy with a high-level, developer-friendly data
model with rich types, versioning and schema management. Lily offers
simple Java and REST APIs for creating, reading and managing data. Its
flexible indexing mechanism supports interactive and batch-oriented
index maintenance.

Lily is the foundation for any large-scale data-centric application:
social media, e-commerce, large content management applications,
product catalogs, archiving, media asset management: any data-centric
application with an ambition to scale beyond a single-server setup.

Lily is dead serious about Scale. The Lily repository has been tested
to scale beyond any common content repository technology out there,
due to its inherently distributed architecture, providing economically
affordable, robust, and high-performing data management services for
any kind of enterprise application.

>For whom

Lily puts BigData technology within reach of enterprise and corporate
developers, wrapping high-care leading-edge technology in a
developer-and administrator-friendly package. Lily offers the
flexibility and scalability of Apache HBase, the de-facto leading
Google BigTable implementation, and the sophistication and robustness
of Apache SOLR, the market leader of open source enterprise and
internet search. Lily sits on the shoulders of these Big Data
revolution leaders, and provides additional ease of use needed for
corporate adoption.

>Thanks

Lily builds further upon the best data and search technology out
there: Apache HBase and SOLR. HBase is in use at some of the largest
data properties out there: Facebook, StumbleUpon and Yahoo! SOLR is
rapidly replacing proprietary enterprise search solutions all over the
place and is one of the most popular open source projects at the
Apache Software Foundation. We're thankful for the developer
communities working hard on these projects, and strive hard to
contribute back where possible. We're also appreciative of the
commercial service suppliers backing these projects: Lucid Imagination
and Cloudera.

>Where

Everything Lily can be found at www.lilyproject.org. Enjoy!

Thanks,

The Lily team @ http://outerthought.org/

Outerthought
Scalable Smart Data, made Easy
Makers of Kauri, Daisy CMS and Lily


Something for the weekend - Lily 0.2 is OUT ! :)

2010-10-29 Thread Steven Noels
Dear all,

three months after the highly anticipated proof of architecture release,
we're living up to our promises, and are releasing Lily 'CR' 0.2 today - a
fully-distributed, highly scalable and highly available content repository,
marrying best-of-breed database and search technology into a powerful,
productive and easy-to-use solution for contemporary internet-scale content
applications.
For whom

You're building content applications (content management, archiving, asset
management, DMS, WCMS, portals, ...) that scale well, either as a product, a
project or in the cloud. You need a trustworthy underlying content
repository that provides a flexible and easy-to-use content model you can
adapt to your requirements. You have a keen interest in NoSQL/HBase
technology but needs a higher-level API, and scalable indexing and search as
well.
Foundations

Lily builds further upon Apache HBase and Apache SOLR. HBase is a faithful
implementation of the Google BigTable database, and provides infinite
elastic scaling and high-performance access to huge amounts of data. SOLR is
the server version of Lucene, the industry-standard search library. Lily
joins HBase and SOLR in a single, solidly packaged content repository
product, with automated sharding (making use of multiple hardware nodes to
provide scaling of volume and performance) and automatic index maintenance.
Lily adds a sophisticated, yet flexible and surprisingly practical content
schema on top of this, providing the structuredness of more classic
databases, versioning, secondary indexing, queuing: all the stuff developers
care for when fixing real-world problems.
Key features of this release

   - Fully distributed: Lily has a fully-distributed architecture making
   maximum use of all available hardware for scalability and availability.
   ZooKeeper is used for distributed process coordination, configuration and
   locking. Index maintenance is based on an HBase-backed RowLog mechanism
   allowing fast but reliable updating of SOLR indexes.
   - Index maintenance: Lily offers all the features and functionality of
   SOLR, but makes index maintenance a breeze, both for interactive as-you-go
   updating and MapReduce-based full index rebuilds
   - Multi-indexers: for high-load situations, multiple indexers can work in
   parallel and talk to a sharded SOLR setup
   - REST interface: a flexible and platform-neutral access method for all
   Lily operations using HTTP and JSON
   - Improved content model: we added URI as a base Lily type as a (small)
   indication of our interest in semantic technology

More importantly, we commit ourselves to take care of API compatibility and
data format layout from this release onwards - as much as humanly possible.
Lily 0.2 offers the API we want to support in the final release. Lily 0.2 is
our contract for content application developers, upgrading to Lily final
should require them to do as little code or data changes as possible.
>From where

Download Lily from www.lilyproject.org. It's Apache Licensed Open Source. No
strings attached.
Enterprise support

Together with this release, we're rolling out our commercial support
services <http://outerthought.org/site/services/lily.html> (and signed up a
first customer, yay!) that allows you to use Lily with peace of mind. Also,
this release has been fully tested and depends on the latest Cloudera
Distribution for Hadoop <http://www.cloudera.com/hadoop/> (CDH3 beta3).
Next up

Lily 1.0 is planned for March 2011, with an interim release candidate in
January. We'll be working on performance enhancements, feature additions,
and are happily - eagerly - awaiting your feedback and comments. We'll post
a roadmap for Lily 0.3 and onwards by mid November.
Follow us

If you want to keep track of Lily's on-going development, join the Lily
discussion list or follow our company Twitter
@outerthought<http://twitter.com/#%21/outerthought>
.
Thank you

I'd like to thank Bruno and Evert for their hard work so far, the HBase and
SOLR community for their help, the IWT government fund for their partial
financial support, and all of our early Lily adopters and enthusiasts for
their much valued feedback. You guys rock!

Steven.
-- 
Steven Noels
http://outerthought.org/
Open Source Content Applications
Makers of Kauri, Daisy CMS and Lily