Re: [sqlite] FTS & Doc Compression

2010-03-25 Thread Alexey Pechnikov
Hello!

On Tuesday 02 March 2010 02:41:46 Jason Lee wrote:
> I've been playing around with the FTS3 (via the amalgamation src) on a
> mobile device and it's working well. But my db file size is getting
> pretty big and I was looking for a way to compress it. I've seen some
> earlier posts from Alexey for his compression modifications to the
> FTS3 extension, but nothing for the amalgamation file.

It's easy to build SQLite from full source tree. 

The modified files ext/fts3/fts3.c and ext/fts3/fts3_write.c for SQLite 3.6.23
are here:
http://sqlite.mobigroup.ru/src/vinfo/d3d9906674
or direct links:
http://sqlite.mobigroup.ru/src/artifact/57b279352c
http://sqlite.mobigroup.ru/src/artifact/daee6be790
(click on "download" link)

Or you can patch you amalgamation...

Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS & Doc Compression

2010-03-25 Thread Alexey Pechnikov
Hello!

On Tuesday 02 March 2010 15:25:35 Max Vlasov wrote:
> can you calculate the ratio between your text data and fts3 data?

In my databases with unicode texts compressed data is about 25% of original.

Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS & Doc Compression

2010-03-03 Thread Alexandre Courbot
While I am not directly concerned by the problem, a possibility to
transparently compress the text of FTS3 tables (not the indexes, just the
contents of the virtual column) using zlib would be great. I cut a database
size in half by doing this on non-fts3 text tables. DEFLATE being very
efficient in terms of speed even for embedded devices by today's standard,
I'm convinced this could do wonders.

Alex.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS & Doc Compression

2010-03-02 Thread Jason Lee
My db definitely did go up in size with fts - which I think is ok just
because that's what needs to be when using fts. So I'm not concerned
so much about the stop words and things,  although I agree that
adjusting that list would definitely help.

Since I'm on a mobile device, space is key. I think if I wasn't using
fts, I'd still want to compress the db. But the fact that I need both
has led me to try to figure this out. I was just wondering if by some
chance someone had done the hard work already of wrapping in some
compression functions into the amalgamation src - saves me some work.
But if they haven't then this might be something I will have to do
over the next few months.

While there would probably be some sort of speed hit, I think
compression, especially for mobile devices, would definitely be
useful. In the case for the app I'm working on, our writes can afford
to be slowest and therefore use the max compression, which could
possibly give us a nice small db size. Alexey's code provided a good
starting point, so I'll probably start there.

Thanks for the reply.

- jason

On Tue, Mar 2, 2010 at 4:25 AM, Max Vlasov  wrote:
> On Tue, Mar 2, 2010 at 2:41 AM, Jason Lee  wrote:
>
>> Hi all,
>>
>> I've been playing around with the FTS3 (via the amalgamation src) on a
>> mobile device and it's working well. But my db file size is getting
>> pretty big and I was looking for a way to compress it.
>>
>
>
> Jason, can you calculate the ratio between your text data and fts3 data?
> From my tests it showed that fts eats not so much. For example, once I tried
> en wikipedia abstracts as a test file (downloadable xml, I took title and
> abstract from it), it's 3M records, 500M file without fts, after indexing
> the size has changed to 1,5G. And I even didn't use stop-words. So with
> proper stop-words usage the ratio can even be better.
>
> Max
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS & Doc Compression

2010-03-02 Thread Max Vlasov
On Tue, Mar 2, 2010 at 2:41 AM, Jason Lee  wrote:

> Hi all,
>
> I've been playing around with the FTS3 (via the amalgamation src) on a
> mobile device and it's working well. But my db file size is getting
> pretty big and I was looking for a way to compress it.
>


Jason, can you calculate the ratio between your text data and fts3 data?
>From my tests it showed that fts eats not so much. For example, once I tried
en wikipedia abstracts as a test file (downloadable xml, I took title and
abstract from it), it's 3M records, 500M file without fts, after indexing
the size has changed to 1,5G. And I even didn't use stop-words. So with
proper stop-words usage the ratio can even be better.

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS & Doc Compression

2010-03-01 Thread Jason Lee
Hi all,

I've been playing around with the FTS3 (via the amalgamation src) on a
mobile device and it's working well. But my db file size is getting
pretty big and I was looking for a way to compress it. I've seen some
earlier posts from Alexey for his compression modifications to the
FTS3 extension, but nothing for the amalgamation file.

Does anyone know of any patches for this? I've combed all over the
sqlite website and haven't seen anything for fts document compression,
so I'm sure this doesn't exist. But, doesn't hurt to ask.

TIA,

- jason
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users