Re: [sqlite] Database to SQLite Population

2014-04-06 Thread danap
> On 4/5/2014 12:33 PM, da...@dandymadeproductions.com wrote:
>> The assumption is that the networked database, datasource, could be on
>> the local lan or Internet.
>
> So am I to understand that you indeed are concerned (1) that the pipe
> could be slow, and (2) that the server may be heavily loaded?  Alright.
>
>> The 'snapshot' would not necessarily be everything, but based on a
>> SELECT statement of a set of the datasource content.
>
> Okay, that's good.
>
>> The benefit I see from this local file/memory database is that I have
>> found some processing of data for analysis occurs over and over to
>> derive comparison results. By having the data local the user can
>> perform these analysis without constantly re-querying the production
>> database.
>
> That makes sense, though there are costs, but you say...
>
>> It is assumed that the user knows that data can be stale at any point
>> beyond the initial load.
>
> Okay, also good that this limitation is known and accepted.
>
>> The analysis tools can also remain unchanged since the data is still
>> coming from a RDBM.
>
> Yes, good point.  This is definitely a massive plus over rolling your
> own caching mechanisms.
>
>> The only reason the queue was considered is because it is quite
>> conceivable that a network datasource would be the choke point so a
>> queue being filled by several threads in process 1 would speed up the
>> population.
>
> I'm trying to understand your statement because at first glance it seems
> contradictory.  You assert the data source may be slow, but the solution
> you present is normally used when interfacing with a slow data *sink*.
>
> My best guess is you're considering simultaneously running multiple
> source queries simultaneously to better utilize (monopolize, really) the
> network, and this design indeed typically needs a queue to serialize its
> output so it can be fed to a sink that accepts only one thing at a time.
>

Yes, and is assumed as you point out later fast at populating data.

> I have two answers for you.  Pick whichever you like.  I apologize in
> advance for the level of detail in answer #2.  I just want to be clear
> so you can make the right decision.  Also I can't help but plug the
> Wibble web server (featuring Tcl coroutines) since it's my baby.
>
> Answer 1: Keep it simple.  Have one connection only, and just write
> whatever you receive as you get it.  SQLite will not be your bottleneck.
> One, it's very fast.  Two, you already say the source is the choke
> point.  No need to complicate things.
>
> Answer 2: You really do need to have multiple connections at a time, and
> you're willing to have a more complex system to support this approach.
>

I do not want to clog up the mailing list with further additional details,
and have not included those details from 2.

In short would prefer solution 1. and have to contemplate the benefit of
some performance increase for the level of complexity introduced in
solution 2.

Again thank you Andy for your input. I will have to take time to digest the
insight you have provided in the details of Answer 2. I will review this
information to more fully understand the possibilities.

>
> ~
> ~
> ~
>
> So to sum up, you want to repeatedly analyze one or more data sets which
> just so happen to have been read from a remote database, but you don't
> want to make that remote database do all the work because it could be at
> the nether end of the Internet.  You want to do this analysis using your
> existing codebase which was designed to operate using SQL.  You see
> SQLite as a good fit because it's compatible (useful subset of SQL) and
> is trivial to set up (link it into your program and you're done).  That
> all seems totally reasonable to me.
>
> --
> Andy Goth | 

I think you summed it up pretty well.

Dana M. Proctor
MyJSQLView Project Manager

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database to SQLite Population

2014-04-05 Thread danap
> On 4/4/2014 1:21 PM, da...@dandymadeproductions.com wrote:
>> On working with the MyJSQLView database GUI access tool it has been
>> determined that a local file/memory database would be valuable to
>> perform recurring analysis on datasets from the connected datasource.
>> Sqlite is being considered as the local database.
>
> If I understand you correctly, you're suggesting making a local snapshot
> of a networked database to optimize performance.  I'm not sure what
> remote database you're using, but it seems to me with properly designed
> prepared statements there won't be much gain in downloading everything
> in advance to the local machine, especially since the download will
> certainly include more data than is actually needed.  Additionally
> consider the loss of coherency when the upstream database is modified
> but the local snapshot becomes stale.

The assumption is that the networked database, datasource, could be on the
local lan or Internet. The 'snapshot' would not necessarily be everything,
but based on a SELECT statement of a set of the datasource content. The
application already has a mechanism in place that the user can store queries
in a bucket for reuse. I guess a similar commercial term for this would
be ETL, but without the transform perhaps. One of the commercial tools out
there called Tableau I believe is using this exact concept.

The benefit I see from this local file/memory database is that I have found
some processing of data for analysis occurs over and over to derive
comparison
results. By having the data local the user can perform these analysis without
constantly re-querying the production database. The analysis tools can also
remain unchanged since the data is still coming from a RDBM. It is assumed
that the user knows that data can be stale at any point beyond the initial
load.

>
>> All the underlining code has been created for conversion between
>> datasource and local database. The code is now being developed to
>> perform the DB to DB transfer population. The basic algorithm being
>> considered is:
>>
>> Process 1, Loop: 1,2,3
>> 1. Read Datasource row from query.
>> 2. Create SQL INSERT statement.
>> 3. Write SQL INSERT into queue.
>>
>> Process 2, Loop: 4,5
>> 4. Read SQL INSERT from queue.
>> 5. Write SQL INSERT to SQLite db.}
>
> The queue seems to be an unnecessary intermediary.  Simply alternate
> between reading from the remote database and writing the received data
> to the SQLite database.  This simpler design is also more amenable to
> prepared statements which offer indispensible performance and security
> benefits.

Thank you for that input. My general thoughts were along your statement
of a simpler design. The only reason the queue was considered is because
it is quite conceivable that a network datasource would be the choke point
so a queue being filled by several threads in process 1 would speed up
the population.

>
> Do keep in mind that an SQL database consists not only of INSERTs (the
> data) but also CREATE TABLEs (the schema) plus indexes and triggers and
> views.

The data is the only interest here beside indexes for the new table data.
Completed routines have already been created to re-create the structure
of the datasource database.

>
>> Perhaps someone from this forum could possibly comment on another open
>> source project with similar type of db to db transfer that could be
>> studied or alternative algorithm.
>
> It doesn't matter what database you use, the algorithm remains the same:
> read source, write destination, repeat.  Many database systems have
> convenient import routines for common data sources.  For example, the
> SQLite shell has the .import command for loading a file into a table.
>
> --
> Andy Goth | 

These import routines are exactly how I have had to populate a local SQLite
database when testing analysis plugins. The process of setting up the
database
table(s) exporting from datasource data and importing to the local
file/memory
database would be much simply for non-expert users if automated so they
can focus on deriving results from analysis with a local higher
performance file/memory database.

Thank you Andy for your comments.

Dana M. Proctor
MyJSQLView Project Manager

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Database to SQLite Population

2014-04-04 Thread danap
Hello,

On working with the MyJSQLView database GUI access tool it
has been determined that a local file/memory database would
be valuable to perform recurring analysis on datasets from
the connected datasource. Sqlite is being considered as the
local database.

All the underlining code has been created for conversion
between datasource and local database. The code is now being
developed to perform the DB to DB transfer population. The
basic algorithm being considered is:

Process 1, Loop: 1,2,3
1. Read Datasource row from query.
2. Create SQL INSERT statement.
3. Write SQL INSERT into queue.

Process 2, Loop: 4,5
4. Read SQL INSERT from queue.
5. Write SQL INSERT to SQLite db.}

Perhaps someone from this forum could possibly comment on
another open source project with similar type of db to db
transfer that could be studied or alternative algorithm.

Thanks,

Dana M. Proctor
MyJSQLView Open Source Project Manager.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users