Re: Separate Solr build: help with the remaining last mile needed.

2021-03-04 Thread Dawid Weiss
> ./gradlew -Dskip.lucene=true assemble check -x test -x documentation
> -x checkBrokenLinks -x checkLocalJavadocLinksSite

I made Solr documentation compile last night. Some of the generated
links point at void (and thus the broken links checker fails) but I
think it's enough to make the repository split and then clean up
what's needed in each corresponding repository.

So... this Sunday? Prior to that, we'd have to disable CI build
services currently pointing at the master branch so that they don't
start failing (I'll remove all content from master).

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Modify Lucene to make it an inverted index suitable for cloud native environment

2021-03-04 Thread f...@lucene.cn
With the landing of lxdb, I always feel that there is something missing. Before 
lxdb started, I took a pen and drew what kind of database is the perfect 
database in my mind. Now that the main design goal has been completed, I still 
feel that it is not perfect. In essence, Lucene, the core of recording letters, 
is not perfect enough and cannot be split. There are still some deficiencies in 
the cloud native environment

The essence of lxdb is to integrate spark, HBase and Lucene into one product, 
just like the gourd baby that I saw when I was a perfect child. Seven gourd 
brothers merged into a big diamond gourd baby, which is more powerful. It has 
the powerful OLAP analysis ability of spark, the real-time update ability of 
HBase, and the rapid multi-dimensional filtering with the help of Lucene index, 
Roughly speaking, it is almost a perfect product, which can almost meet most of 
the scenarios in the field of big data, such as perfect distributed storage, 
distributed computing, high concurrency and flexibility. Most of the products 
on the market can not meet the technical perfection of lxdb, the timeliness of 
kudu, the OLAP performance of spark, the full-text retrieval of ES and the high 
concurrency of HBase. But the real use is that there are some very unpleasant 
places. Let me give you some examples one by one.

== Existing problems==
1. The process must be resident and cannot be used on demand
The disadvantage of Lucene and HBase is that once the service is started, the 
process must be resident. No matter whether there is query or data import, 
these processes must be hung on it
What I expect more is that like the native spark, it can start some processes 
when there are SQL queries. When these processes are not used, they are slowly 
recycled

2. Different calculations of the same data are not separated, so it is 
impossible to realize the resource isolation of calculation
Another disadvantage of resident process is that all calculations must be read 
by resident process, and most of the time tasks have priority. The response 
speed of ad hoc query task is much faster than that of batch query task. We 
want to give more and faster resources to ad hoc query task, and let batch task 
run slowly in the background

The resident process brings us a lot of trouble in this aspect. We prefer to 
separate computing, and separate different types of tasks to different 
processes, or even to different computing nodes, so as to avoid mutual influence

3. Can't split able, the computing resources used by the same data can't be 
flexibly adjusted
For the same piece of data, we often hope that a very important query needs to 
run quickly to get the result. I can allocate a lot of computing resources to 
it to get the result as soon as possible. For those unimportant tasks, we can 
allocate a few processes to run slowly, That is, it can't dynamically adjust 
and slice computing resources, it can only bind fixed processes to compute

4. Multiple systems cannot communicate with each other
Most of the time, I hope that the index format of lxdb can be more open and run 
directly in other systems without any change. Just like hive, I create a data 
table and define the parquet format. Besides hive itself, impala can directly 
access its data, Presto and spark can also access it. This system is more 
flexible

The current way of binding process between HBase and Lucene makes the data in 
lxdb of other systems can only be transferred once through the service of lxdb 
and the resident process of lxdb, which greatly affects the efficiency and 
increases the complexity of interworking between multiple systems. We prefer to 
interweave in the file layer directly through the format of type parquet 
without transfer service

==How do we plan to solve this problem==
1. we don't plan to shave Lucene

Lucene is still the king in the field of full-text retrieval and 
multi-dimensional retrieval. There are no comparison between various 
performance indicators. I have measured various data formats or database 
systems. But in this field, there is no way to surpass Lucene, and there is no 
one saying that Lucene is the level of "Wang". At present, the popular Solr and 
elasticsearch also rely directly or indirectly on lucene

2. we plan to transform Lucene

Lucene's core is inverted index, which involves the storage formats of forward 
and backward. We intend to keep these concepts and API interfaces, and the 
logic remains unchanged

But the implementation of inverted and forward row is replaced by the original 
blocktree and block compressed FDT and docvalues stored by columns. In fact, we 
find that the format of the nested column storage is very similar to the 
inverted index. Only when many people use parquet, the data storage is random, 
After we move parquet into Lucene framework, because of the ordered nature of 
inverted tables, the performance of parquet will be particularly good. 
Moreover, Lucene's original 

Re: [VOTE] Release PyLucene 8.8.1

2021-03-04 Thread Phil
+1 from me

Tested the extension args output in JCC 3.9 - looks good!

Thanks,
Phil.

Andi Vajda writes:

> The PyLucene 8.8.1 (rc1) release tracking the recent release of
> Apache Lucene 8.8.1 is ready.
>
> A release candidate is available from:
>https://dist.apache.org/repos/dist/dev/lucene/pylucene/8.8.1-rc1/
>
> PyLucene 8.8.1 is built with JCC 3.9, included in these release artifacts.
>
> JCC 3.9 supports Python 3.3 up to Python 3.9 (in addition to Python 2.3+).
> PyLucene may be built with Python 2 or Python 3.
>
> Please vote to release these artifacts as PyLucene 8.8.1.
> Anyone interested in this release can and should vote !
>
> Thanks !
>
> Andi..
>
> ps: the KEYS file for PyLucene release signing is at:
> https://dist.apache.org/repos/dist/release/lucene/pylucene/KEYS
> https://dist.apache.org/repos/dist/dev/lucene/pylucene/KEYS
>
> pps: here is my +1



Re: Separate Solr build: help with the remaining last mile needed.

2021-03-04 Thread Dawid Weiss
Thanks, Houston!

On Thu, Mar 4, 2021 at 7:54 PM Houston Putman  wrote:
>
> Aha it's broken independently of your PR. I'll try to fix it, sorry for the 
> noise.
>
> - Houston
>
> On Thu, Mar 4, 2021 at 1:48 PM Dawid Weiss  wrote:
>>
>> Thanks. I might have broken something... but it seems unlikely. I
>> merely shuffled some things around - it should be an identical build.
>>
>> D.
>>
>> On Thu, Mar 4, 2021 at 7:45 PM Houston Putman  
>> wrote:
>> >
>> > It worked as of 10 days ago: 
>> > https://github.com/apache/lucene-solr/actions/workflows/docker-test.yml
>> >
>> > I created a test PR to see if it works based on master: 
>> > https://github.com/apache/lucene-solr/pull/2454
>> >
>> > On Thu, Mar 4, 2021 at 1:39 PM Dawid Weiss  wrote:
>> >>
>> >> Thanks Houston. Is this failure on the branch only (does it work on 
>> >> master)?
>> >>
>> >> Dawid
>> >>
>> >> On Thu, Mar 4, 2021 at 6:48 PM Houston Putman  
>> >> wrote:
>> >> >
>> >> > I'm not sure why the docker github action is failing... I've tried it 
>> >> > locally and it works fine. I'll do some more investigation.
>> >> >
>> >> > On Thu, Mar 4, 2021 at 3:25 AM Dawid Weiss  
>> >> > wrote:
>> >> >>
>> >> >> Hi folks,
>> >> >>
>> >> >> I'll need some help with some remaining tasks to make the transition
>> >> >> easier after the solr repo split. I've made some changes to allow
>> >> >> building just Solr or just Lucene on master --
>> >> >>
>> >> >> https://github.com/apache/lucene-solr/pull/2448
>> >> >>
>> >> >> 1. I really don't know much about docker testing and why it fails on 
>> >> >> that PR.
>> >> >>
>> >> >> 2. Lucene build seems to work just fine.
>> >> >>
>> >> >> 3. You can build Solr independently (with a Lucene snapshot from
>> >> >> Apache repository) by running:
>> >> >>
>> >> >> ./gradlew -Dskip.lucene=true assemble check -x test -x documentation
>> >> >> -x checkBrokenLinks -x checkLocalJavadocLinksSite
>> >> >>
>> >> >> the "-x" tasks are what I'll need some help with - I guess they do
>> >> >> have cross-references to Lucene-generated stuff that needs to be
>> >> >> replaced (or dropped).
>> >> >>
>> >> >> 4. There are three tests that need to be removed from the codebase,
>> >> >> moved to Lucene or rewritten: TestICUCollationField,
>> >> >> TestLuceneIndexBackCompat and TestXmlQParser. These tests require
>> >> >> access to Lucene test sources and these won't be published as an
>> >> >> artifact.
>> >> >>
>> >> >> An alternative to trying to solve the above is to just split the repo
>> >> >> and let it burn/ crash, but I thought by fixing those issues on
>> >> >> current master branch we can prepare the infrastructure while
>> >> >> everything else just works.
>> >> >>
>> >> >> Dawid
>> >> >>
>> >> >> -
>> >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> >> >> For additional commands, e-mail: dev-h...@lucene.apache.org
>> >> >>
>> >>
>> >> -
>> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: dev-h...@lucene.apache.org
>> >>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release PyLucene 8.8.1

2021-03-04 Thread Dawid Weiss
Apologies for being late to the party: +1 from me.

D.

On Tue, Mar 2, 2021 at 3:35 AM Andi Vajda  wrote:
>
>
> The PyLucene 8.8.1 (rc1) release tracking the recent release of
> Apache Lucene 8.8.1 is ready.
>
> A release candidate is available from:
> https://dist.apache.org/repos/dist/dev/lucene/pylucene/8.8.1-rc1/
>
> PyLucene 8.8.1 is built with JCC 3.9, included in these release artifacts.
>
> JCC 3.9 supports Python 3.3 up to Python 3.9 (in addition to Python 2.3+).
> PyLucene may be built with Python 2 or Python 3.
>
> Please vote to release these artifacts as PyLucene 8.8.1.
> Anyone interested in this release can and should vote !
>
> Thanks !
>
> Andi..
>
> ps: the KEYS file for PyLucene release signing is at:
> https://dist.apache.org/repos/dist/release/lucene/pylucene/KEYS
> https://dist.apache.org/repos/dist/dev/lucene/pylucene/KEYS
>
> pps: here is my +1


Re: Separate Solr build: help with the remaining last mile needed.

2021-03-04 Thread Houston Putman
Aha it's broken independently of your PR. I'll try to fix it, sorry for the
noise.

- Houston

On Thu, Mar 4, 2021 at 1:48 PM Dawid Weiss  wrote:

> Thanks. I might have broken something... but it seems unlikely. I
> merely shuffled some things around - it should be an identical build.
>
> D.
>
> On Thu, Mar 4, 2021 at 7:45 PM Houston Putman 
> wrote:
> >
> > It worked as of 10 days ago:
> https://github.com/apache/lucene-solr/actions/workflows/docker-test.yml
> >
> > I created a test PR to see if it works based on master:
> https://github.com/apache/lucene-solr/pull/2454
> >
> > On Thu, Mar 4, 2021 at 1:39 PM Dawid Weiss 
> wrote:
> >>
> >> Thanks Houston. Is this failure on the branch only (does it work on
> master)?
> >>
> >> Dawid
> >>
> >> On Thu, Mar 4, 2021 at 6:48 PM Houston Putman 
> wrote:
> >> >
> >> > I'm not sure why the docker github action is failing... I've tried it
> locally and it works fine. I'll do some more investigation.
> >> >
> >> > On Thu, Mar 4, 2021 at 3:25 AM Dawid Weiss 
> wrote:
> >> >>
> >> >> Hi folks,
> >> >>
> >> >> I'll need some help with some remaining tasks to make the transition
> >> >> easier after the solr repo split. I've made some changes to allow
> >> >> building just Solr or just Lucene on master --
> >> >>
> >> >> https://github.com/apache/lucene-solr/pull/2448
> >> >>
> >> >> 1. I really don't know much about docker testing and why it fails on
> that PR.
> >> >>
> >> >> 2. Lucene build seems to work just fine.
> >> >>
> >> >> 3. You can build Solr independently (with a Lucene snapshot from
> >> >> Apache repository) by running:
> >> >>
> >> >> ./gradlew -Dskip.lucene=true assemble check -x test -x documentation
> >> >> -x checkBrokenLinks -x checkLocalJavadocLinksSite
> >> >>
> >> >> the "-x" tasks are what I'll need some help with - I guess they do
> >> >> have cross-references to Lucene-generated stuff that needs to be
> >> >> replaced (or dropped).
> >> >>
> >> >> 4. There are three tests that need to be removed from the codebase,
> >> >> moved to Lucene or rewritten: TestICUCollationField,
> >> >> TestLuceneIndexBackCompat and TestXmlQParser. These tests require
> >> >> access to Lucene test sources and these won't be published as an
> >> >> artifact.
> >> >>
> >> >> An alternative to trying to solve the above is to just split the repo
> >> >> and let it burn/ crash, but I thought by fixing those issues on
> >> >> current master branch we can prepare the infrastructure while
> >> >> everything else just works.
> >> >>
> >> >> Dawid
> >> >>
> >> >> -
> >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >> >>
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Separate Solr build: help with the remaining last mile needed.

2021-03-04 Thread Dawid Weiss
Thanks. I might have broken something... but it seems unlikely. I
merely shuffled some things around - it should be an identical build.

D.

On Thu, Mar 4, 2021 at 7:45 PM Houston Putman  wrote:
>
> It worked as of 10 days ago: 
> https://github.com/apache/lucene-solr/actions/workflows/docker-test.yml
>
> I created a test PR to see if it works based on master: 
> https://github.com/apache/lucene-solr/pull/2454
>
> On Thu, Mar 4, 2021 at 1:39 PM Dawid Weiss  wrote:
>>
>> Thanks Houston. Is this failure on the branch only (does it work on master)?
>>
>> Dawid
>>
>> On Thu, Mar 4, 2021 at 6:48 PM Houston Putman  
>> wrote:
>> >
>> > I'm not sure why the docker github action is failing... I've tried it 
>> > locally and it works fine. I'll do some more investigation.
>> >
>> > On Thu, Mar 4, 2021 at 3:25 AM Dawid Weiss  wrote:
>> >>
>> >> Hi folks,
>> >>
>> >> I'll need some help with some remaining tasks to make the transition
>> >> easier after the solr repo split. I've made some changes to allow
>> >> building just Solr or just Lucene on master --
>> >>
>> >> https://github.com/apache/lucene-solr/pull/2448
>> >>
>> >> 1. I really don't know much about docker testing and why it fails on that 
>> >> PR.
>> >>
>> >> 2. Lucene build seems to work just fine.
>> >>
>> >> 3. You can build Solr independently (with a Lucene snapshot from
>> >> Apache repository) by running:
>> >>
>> >> ./gradlew -Dskip.lucene=true assemble check -x test -x documentation
>> >> -x checkBrokenLinks -x checkLocalJavadocLinksSite
>> >>
>> >> the "-x" tasks are what I'll need some help with - I guess they do
>> >> have cross-references to Lucene-generated stuff that needs to be
>> >> replaced (or dropped).
>> >>
>> >> 4. There are three tests that need to be removed from the codebase,
>> >> moved to Lucene or rewritten: TestICUCollationField,
>> >> TestLuceneIndexBackCompat and TestXmlQParser. These tests require
>> >> access to Lucene test sources and these won't be published as an
>> >> artifact.
>> >>
>> >> An alternative to trying to solve the above is to just split the repo
>> >> and let it burn/ crash, but I thought by fixing those issues on
>> >> current master branch we can prepare the infrastructure while
>> >> everything else just works.
>> >>
>> >> Dawid
>> >>
>> >> -
>> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: dev-h...@lucene.apache.org
>> >>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Separate Solr build: help with the remaining last mile needed.

2021-03-04 Thread Houston Putman
It worked as of 10 days ago:
https://github.com/apache/lucene-solr/actions/workflows/docker-test.yml

I created a test PR to see if it works based on master:
https://github.com/apache/lucene-solr/pull/2454

On Thu, Mar 4, 2021 at 1:39 PM Dawid Weiss  wrote:

> Thanks Houston. Is this failure on the branch only (does it work on
> master)?
>
> Dawid
>
> On Thu, Mar 4, 2021 at 6:48 PM Houston Putman 
> wrote:
> >
> > I'm not sure why the docker github action is failing... I've tried it
> locally and it works fine. I'll do some more investigation.
> >
> > On Thu, Mar 4, 2021 at 3:25 AM Dawid Weiss 
> wrote:
> >>
> >> Hi folks,
> >>
> >> I'll need some help with some remaining tasks to make the transition
> >> easier after the solr repo split. I've made some changes to allow
> >> building just Solr or just Lucene on master --
> >>
> >> https://github.com/apache/lucene-solr/pull/2448
> >>
> >> 1. I really don't know much about docker testing and why it fails on
> that PR.
> >>
> >> 2. Lucene build seems to work just fine.
> >>
> >> 3. You can build Solr independently (with a Lucene snapshot from
> >> Apache repository) by running:
> >>
> >> ./gradlew -Dskip.lucene=true assemble check -x test -x documentation
> >> -x checkBrokenLinks -x checkLocalJavadocLinksSite
> >>
> >> the "-x" tasks are what I'll need some help with - I guess they do
> >> have cross-references to Lucene-generated stuff that needs to be
> >> replaced (or dropped).
> >>
> >> 4. There are three tests that need to be removed from the codebase,
> >> moved to Lucene or rewritten: TestICUCollationField,
> >> TestLuceneIndexBackCompat and TestXmlQParser. These tests require
> >> access to Lucene test sources and these won't be published as an
> >> artifact.
> >>
> >> An alternative to trying to solve the above is to just split the repo
> >> and let it burn/ crash, but I thought by fixing those issues on
> >> current master branch we can prepare the infrastructure while
> >> everything else just works.
> >>
> >> Dawid
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Separate Solr build: help with the remaining last mile needed.

2021-03-04 Thread Dawid Weiss
Thanks Houston. Is this failure on the branch only (does it work on master)?

Dawid

On Thu, Mar 4, 2021 at 6:48 PM Houston Putman  wrote:
>
> I'm not sure why the docker github action is failing... I've tried it locally 
> and it works fine. I'll do some more investigation.
>
> On Thu, Mar 4, 2021 at 3:25 AM Dawid Weiss  wrote:
>>
>> Hi folks,
>>
>> I'll need some help with some remaining tasks to make the transition
>> easier after the solr repo split. I've made some changes to allow
>> building just Solr or just Lucene on master --
>>
>> https://github.com/apache/lucene-solr/pull/2448
>>
>> 1. I really don't know much about docker testing and why it fails on that PR.
>>
>> 2. Lucene build seems to work just fine.
>>
>> 3. You can build Solr independently (with a Lucene snapshot from
>> Apache repository) by running:
>>
>> ./gradlew -Dskip.lucene=true assemble check -x test -x documentation
>> -x checkBrokenLinks -x checkLocalJavadocLinksSite
>>
>> the "-x" tasks are what I'll need some help with - I guess they do
>> have cross-references to Lucene-generated stuff that needs to be
>> replaced (or dropped).
>>
>> 4. There are three tests that need to be removed from the codebase,
>> moved to Lucene or rewritten: TestICUCollationField,
>> TestLuceneIndexBackCompat and TestXmlQParser. These tests require
>> access to Lucene test sources and these won't be published as an
>> artifact.
>>
>> An alternative to trying to solve the above is to just split the repo
>> and let it burn/ crash, but I thought by fixing those issues on
>> current master branch we can prepare the infrastructure while
>> everything else just works.
>>
>> Dawid
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Separate Solr build: help with the remaining last mile needed.

2021-03-04 Thread Houston Putman
I'm not sure why the docker github action is failing... I've tried it
locally and it works fine. I'll do some more investigation.

On Thu, Mar 4, 2021 at 3:25 AM Dawid Weiss  wrote:

> Hi folks,
>
> I'll need some help with some remaining tasks to make the transition
> easier after the solr repo split. I've made some changes to allow
> building just Solr or just Lucene on master --
>
> https://github.com/apache/lucene-solr/pull/2448
>
> 1. I really don't know much about docker testing and why it fails on that
> PR.
>
> 2. Lucene build seems to work just fine.
>
> 3. You can build Solr independently (with a Lucene snapshot from
> Apache repository) by running:
>
> ./gradlew -Dskip.lucene=true assemble check -x test -x documentation
> -x checkBrokenLinks -x checkLocalJavadocLinksSite
>
> the "-x" tasks are what I'll need some help with - I guess they do
> have cross-references to Lucene-generated stuff that needs to be
> replaced (or dropped).
>
> 4. There are three tests that need to be removed from the codebase,
> moved to Lucene or rewritten: TestICUCollationField,
> TestLuceneIndexBackCompat and TestXmlQParser. These tests require
> access to Lucene test sources and these won't be published as an
> artifact.
>
> An alternative to trying to solve the above is to just split the repo
> and let it burn/ crash, but I thought by fixing those issues on
> current master branch we can prepare the infrastructure while
> everything else just works.
>
> Dawid
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Configurable Postings Block Size?

2021-03-04 Thread Greg Miller
Thanks Robert. I've created
https://issues.apache.org/jira/browse/LUCENE-9822 and will attach a patch
shortly.

Cheers,
-Greg

On Wed, Mar 3, 2021 at 6:21 PM Robert Muir  wrote:

> I think its a good idea, especially if the assert can be in a good place
> (ideally a not-so-hot place, e.g. encoding, patching code). asserts have
> some costs for this kind of code even when disabled, bytecode count limits
> are used for compiler threshold and stuff.
>
> On Wed, Mar 3, 2021 at 9:05 PM Greg Miller  wrote:
>
>> So, slightly different topic, maybe, but related so tacking onto this
>> thread...
>>
>> While tweaking ForUtil locally to experiment with different block sizes,
>> I realized that PForUtil encodes the offset for each "patch" using a single
>> byte, which implies a strict upper limit of 256 on the BLOCK_SIZE defined
>> in ForUtil. This essentially silently failed on me when I was trying to set
>> up blocks of 512. The unit tests caught it since the results were incorrect
>> after encoding/decoding with PForUtil (hooray!), but it would have been
>> nice to have an assert somewhere guarding for this to make matters a little
>> more explicit.
>>
>> While I realize that the likelihood of changing the blockside in ForUtil
>> may be low for now, it seems like such a small, easy change to toss an
>> assert in that it seems useful. What do you all think? Worth opening a
>> minor issue for this and putting in a one-liner?
>>
>> Cheers,
>> -Greg
>>
>> On Mon, Mar 1, 2021 at 11:30 AM Greg Miller  wrote:
>>
>>> Oh, got it. This is great, thanks!
>>>
>>> Cheers,
>>> -Greg
>>>
>>> On Mon, Mar 1, 2021 at 11:28 AM Robert Muir  wrote:
>>>
 Yeah, have a look at gen_ForUtil.py

 On Mon, Mar 1, 2021 at 1:05 PM Greg Miller  wrote:

> Thanks for the feedback Robert; makes sense to me. I'll tinker with a
> forked codec and see if the experimentation produces anything interesting.
>
> When you mention "autogenerated decompression code", do you mean that
> some of this code is actually being generated?
>
> Cheers,
> -Greg
>
> On Sun, Feb 28, 2021 at 5:05 AM Robert Muir  wrote:
>
>> If you want to test a different block size (say 64 or 256), I really
>> recommend to just fork a different codec for the experiment.
>>
>> There will likely be higher level changes you need to make, not just
>> changing a number. For example if you just increased this number to 256
>> without doing anything else, I wouldn't be surprised if you see worse
>> performance. More of the postings would be vint-encoded than before with
>> 128, which might have some consequences. skipdata layout might be
>> inappropriate, these things are optimized for blocks of 128.
>>
>> Just in general, I recommend making a codec for the benchmarking
>> experiments, tools like luceneutil support comparing codecs against each
>> other anyway so you can easily compare fairly against the existing codec.
>> Also, it should be much easier/faster to just make a new codec and adapt 
>> it
>> to test what you want!
>>
>> I think it is an antipattern to make stuff within the codec
>> "flexible", it is autogenerated decompression code :) I am concerned such
>> "flexibility" would create barriers in the future to optimizations. For
>> example we should be able to experiment with converting this compression
>> code over to explicit vector API in java.
>>
>> On Sat, Feb 27, 2021 at 4:29 PM Greg Miller 
>> wrote:
>>
>>> Hi folks!
>>>
>>> I've been a bit curious to test out different block size
>>> configurations in the Lucene postings list format, but thought I'd reach
>>> out to the community here first to see what work may have gone into this
>>> previously. I'm essentially interested in benchmarking different block 
>>> size
>>> configurations on the real-world application of Lucene I'm working on.
>>>
>>> If my understanding of the code is correct, I know we're currently
>>> encoding compressed runs of 128 docs per block, relying on ForUtil for
>>> encoding/decoding purposes. It looks like we define this in
>>> ForUtil#BLOCK_SIZE (and reference it in a few external classes), but 
>>> also
>>> know that it's not as simple as just changing that one definition. It
>>> appears much of the logic in ForUtil relies on the assumption of 128
>>> docs-per-block.
>>>
>>> I'm toying with the idea of making ForUtil a bit more flexible to
>>> allow for different block sizes to be tested in order to run the
>>> benchmarking I'd like to run, but the class looks heavily optimized to
>>> generate SIMD instructions (I think?), so that might be folly. Before I
>>> start hacking on a local branch to see what I can learn, is there any 
>>> prior
>>> work that might be useful to be aware of? Anyone gone down this path and
>>> have some 

Re: Serializatio/Deserialization of Lucene objects like queries, sort fields etc

2021-03-04 Thread jitesh129
Thanks Mike for the quick response.

Michael McCandless-2 wrote
> Hello,
> 
>> Recently we upgraded our Lucene core libraries from v2.9.4 to v8.7.0
> 
>>Wow!  That is the biggest version jump I have heard of in some time :) 
Did
> you have to also migrate an index all that way?  Or, you fully re-indexed
> on you were on 8.7.0?
> 
> Yeah I agree it is one of the biggest jump one could have in terms of
> library upgrade, but mostly this happened due to the feature powered by
> Lucene being stable till date and with some new feature requests we
> decided to upgrade the library. 
> 
> We are re-indexing everything to 8.7.0 instead of migrating the index
> incrementally.
> 
>> as all the lucene classes have been changed to non-serializable and since
> we are using RPC for invoking the search queries we ran into bunch of
> NotSerializableException.
> 
> Long ago (I think perhaps in 4.0 release) we decided removed "implements
> Serializable" from all Lucene classes.  I think this is the (contentious!
> it's title suggests just the opposite!) issue:
> https://issues.apache.org/jira/browse/LUCENE-1473.  We did this because 1)
> Lucene is meant to be a performant, feature rich search engine for a
> *single* JVM/machine, not (yet) a fully distributed search engine, and 2)
> the backwards compatibility implications of truly supporting serializable
> so that one could drop in a new major version of Lucene and expect it to
> correctly/efficiently communicate over-the-wire with older Lucene versions
> was just a too scary high requirement for ongoing development.
> 
> So the Lucene committers long ago decided that it is better to leave such
> serialization to the application or distributed search engine running on
> top of Lucene.  It is a non-feature for Lucene.
> 
>> I went through various forums which suggest either to use toString()
> method of queries and then use query parser at the receiver end
> 
>> Alas, this will also not work.  Our Query.toString() implementations do
>> not
> guarantee that they will always produce a String which, when round-tripped
> through a QueryParser (which QueryParser?), will return the same
> (according
> to .equals()) Query object.  This was also decided at one point to be a
> hopelessly high bar to hold our .toString() methods to.  That said, many
> Query.toString() implementations do work like this, making the situation
> feel trappy :(  Maybe we should consistently add a disclaimer to all
> Query.toString() making this non-feature clear?  At least to Query.java's
> toString, which currently seems to have no such warning:
> 
> This we learned in the hard way.
> 
>   /**
> 
> 
>* Prints a query to a string, with 
> 
> field
> 
>  assumed to be the
> default field and
> 
>* omitted.
> 
> 
>*/
>   public abstract String toString(String field);
> 
> These topics have been discussed many times over the years -- it is
> clearly
> a big need for search applications!  And I agree, is missing now in
> Lucene.
> 
>> Could someone please point me towards correct way of serialization and
> deserialization of Lucene objects.
> 
>> Perhaps look at how Solr or Elasticsearch (hmm, <= 7.10 sources, when
> Elasticsearch was still open-licensed) and borrow/fork/poach those
> implementations?
> 
> Thanks for pointing in this direction, I will have a look at the above
> Solr or ElasticSearch implementations.
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> 
> On Thu, Mar 4, 2021 at 5:14 AM jitesh129 

> jitesh129@

>  wrote:
> 
>> Hello All,
>>
>> Recently we upgraded our Lucene core libraries from v2.9.4 to v8.7.0 and
>> as
>> all the lucene classes have been changed to non-serializable and since we
>> are using RPC for invoking the search queries we ran into bunch of
>> NotSerializableException.
>>
>> I went through various forums which suggest either to use toString()
>> method
>> of queries and then use query parser at the receiver end to convert it
>> back
>> to Query objects. This fixed the NotSerializableException issue but the
>> behaviour of queries and filters were not correct now. While looking into
>> these issues we identified that this could be because of toString and
>> query
>> parising not returning the equivalent query objects.
>>
>> Hence we again started looking for other serialization options and got a
>> reference of using Kryo serializers for the same purpose. But using Kryo
>> serializers we are running into buffer overflow and some time running
>> into
>> ClassCastException for BooleanClause$Occur.
>>
>> Could someone please point me towards correct way of serialization and
>> deserialization of Lucene objects.
>>
>>
>>
>> --
>> Sent from:
>> https://lucene.472066.n3.nabble.com/Lucene-Java-Developer-f564358.html
>>
>> -
>> To unsubscribe, e-mail: 

> dev-unsubscribe@.apache

>> For additional commands, e-mail: 

> dev-help@.apache

>>
>>





--
Sent from: 

Re: Serializatio/Deserialization of Lucene objects like queries, sort fields etc

2021-03-04 Thread Michael McCandless
Hello,

> Recently we upgraded our Lucene core libraries from v2.9.4 to v8.7.0

Wow!  That is the biggest version jump I have heard of in some time :)  Did
you have to also migrate an index all that way?  Or, you fully re-indexed
on you were on 8.7.0?

> as all the lucene classes have been changed to non-serializable and since
we are using RPC for invoking the search queries we ran into bunch of
NotSerializableException.

Long ago (I think perhaps in 4.0 release) we decided removed "implements
Serializable" from all Lucene classes.  I think this is the (contentious!
it's title suggests just the opposite!) issue:
https://issues.apache.org/jira/browse/LUCENE-1473.  We did this because 1)
Lucene is meant to be a performant, feature rich search engine for a
*single* JVM/machine, not (yet) a fully distributed search engine, and 2)
the backwards compatibility implications of truly supporting serializable
so that one could drop in a new major version of Lucene and expect it to
correctly/efficiently communicate over-the-wire with older Lucene versions
was just a too scary high requirement for ongoing development.

So the Lucene committers long ago decided that it is better to leave such
serialization to the application or distributed search engine running on
top of Lucene.  It is a non-feature for Lucene.

> I went through various forums which suggest either to use toString()
method of queries and then use query parser at the receiver end

Alas, this will also not work.  Our Query.toString() implementations do not
guarantee that they will always produce a String which, when round-tripped
through a QueryParser (which QueryParser?), will return the same (according
to .equals()) Query object.  This was also decided at one point to be a
hopelessly high bar to hold our .toString() methods to.  That said, many
Query.toString() implementations do work like this, making the situation
feel trappy :(  Maybe we should consistently add a disclaimer to all
Query.toString() making this non-feature clear?  At least to Query.java's
toString, which currently seems to have no such warning:

  /**


   * Prints a query to a string, with field assumed to be the
default field and

   * omitted.


   */
  public abstract String toString(String field);

These topics have been discussed many times over the years -- it is clearly
a big need for search applications!  And I agree, is missing now in Lucene.

> Could someone please point me towards correct way of serialization and
deserialization of Lucene objects.

Perhaps look at how Solr or Elasticsearch (hmm, <= 7.10 sources, when
Elasticsearch was still open-licensed) and borrow/fork/poach those
implementations?

Mike McCandless

http://blog.mikemccandless.com


On Thu, Mar 4, 2021 at 5:14 AM jitesh129  wrote:

> Hello All,
>
> Recently we upgraded our Lucene core libraries from v2.9.4 to v8.7.0 and as
> all the lucene classes have been changed to non-serializable and since we
> are using RPC for invoking the search queries we ran into bunch of
> NotSerializableException.
>
> I went through various forums which suggest either to use toString() method
> of queries and then use query parser at the receiver end to convert it back
> to Query objects. This fixed the NotSerializableException issue but the
> behaviour of queries and filters were not correct now. While looking into
> these issues we identified that this could be because of toString and query
> parising not returning the equivalent query objects.
>
> Hence we again started looking for other serialization options and got a
> reference of using Kryo serializers for the same purpose. But using Kryo
> serializers we are running into buffer overflow and some time running into
> ClassCastException for BooleanClause$Occur.
>
> Could someone please point me towards correct way of serialization and
> deserialization of Lucene objects.
>
>
>
> --
> Sent from:
> https://lucene.472066.n3.nabble.com/Lucene-Java-Developer-f564358.html
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Re: Solr webpage SEO

2021-03-04 Thread Uwe Schindler
I disagree with excluding older Java docs or refguide using robots! When I look 
for documentation of a class I generally enter class name and version number 
into google.

We can maybe handle this with priorities inside a sitemap.xml or custom http 
headers (X-Robots) using a htaccess rule. I can check this out.

Uwe

Am March 4, 2021 10:41:52 AM UTC schrieb "Jan Høydahl" :
>Sure we could do robots.
>
>But I suspect that we put ourselves in this situation through
>https://issues.apache.org/jira/browse/SOLR-10595 ourselves
>Check out the attachment solr_redirects.conf on that JIRA (also here
>https://gist.github.com/janhoy/a3149e1ed27df020194a2de1a7fa2c16)
>
>Here, we explicitly map all pages that used to be in Confluence (which
>there are a ton of links to on the net), to version 6.6 of the guide.
>Of course, if we change those to "latest", some of those links will
>break, but perhaps it would still be better?
>Or can we be more intelligent in the rewrite rules on Solr site - that
>if you try a "/guide/foo.html" link and it is not found, that you
>display a custom error page or go to front page of latest guide?
>
>Jan
>
>> 4. mar. 2021 kl. 10:21 skrev Ishan Chattopadhyaya
>:
>> 
>> We can add robots.txt to stop Google from indexing/showing in
>results.
>> 
>> On Thu, 4 Mar, 2021, 2:34 pm Jan Høydahl, > wrote:
>> Hi, sending to this list since dev@solr list is not yet announced
>properly.
>> 
>> We have a few days of traffic to the new site and can see the most
>visited pages at https://uls.apache.org/exports/solr.apache.org.yaml
> (see copy below).
>> When I search google for "solr query parser", I get the 6.6 guide on
>top, which is probably why /guide/6_6/the-standard-query-parser.html
>shows up, and the same for the other /guide/6_6/ links. 
>> Some questions:
>> 
>> How can we make Google forget about version 6.6? I know we had a
>bunch of redirects from Confluence to the 6.6 guide, are they still in
>place?
>> Why is /docs/6_6_0/solr-core/index.html the 2nd most visited page?
>Anywhere that links to it?
>> Why is /docs/4_8_1/solr-solrj/index.html so high? Ahywhere that links
>to it?
>> The /mirrors-solr-latest-redir.html redirect was not working. I just
>pushed a fix
>> 
>> 
>> Sheet3:
>>   Name: Most visited pages, past month
>>   Values:
>> /index.html: 443
>> /docs/6_6_0/solr-core/index.html: 281
>> /guide/8_8/solr-tutorial.html: 104
>> /news.html: 92
>> /guide/solr-tutorial.html: 91
>> /resources.html: 75
>> /features.html: 69
>> /docs/8_7_0/solr-core/index.html: 68
>> /downloads.html: 68
>> /guide/6_6/the-standard-query-parser.html: 65
>> /docs/4_8_1/solr-solrj/index.html: 62
>> /guide/6_6/common-query-parameters.html: 50
>> /guide/index.html: 46
>> /docs/8_8_1/solr-solrj/index.html: 44
>> /docs/8_7_0/solr-solrj/index.html: 38
>> /docs/8_8_1/solr-core/index.html: 37
>> /community.html: 24
>> /guide/8_8/: 23
>> /guide/6_6/uploading-data-with-index-handlers.html: 22
>> /docs/8_6_3/solr-core/index.html: 21
>> /guide/6_6/filter-descriptions.html: 21
>> /guide/6_6/collections-api.html: 18
>> /docs/8_6_2/solr-solrj/overview-summary.html: 16
>> /guide/6_6/faceting.html: 16
>> /mirrors-solr-latest-redir.html: 15
>> /whoweare.html: 15
>> /guide/6_6/solrcloud.html: 14
>> /guide/6_6/tokenizers.html: 13
>> /guide/7_0/solr-configuration-files.html: 13
>> /guide/8_8/query-syntax-and-parsing.html: 13
>> /security.html: 13
>> /guide/6_6/introduction-to-solr-indexing.html: 12
>> /guide/8_8/solr-upgrade-notes.html: 12
>> /docs/8_0_0/solr-solrj/allclasses-frame.html: 11
>> /docs/8_6_3/solr-solrj/index.html: 11
>> /guide/6_6/the-dismax-query-parser.html: 11
>> /guide/8_8/getting-started.html: 11
>> /guide/solr-upgrade-notes.html: 11
>> /docs/8_0_0/solr-solrj/overview-summary.html: 10
>> /docs/8_6_2/solr-solrj/index.html: 10
>> /guide/6_6/running-solr.html: 10
>> /docs/7_2_1/solr-solrj/overview-summary.html: 9
>> /docs/8_0_0/solr-solrj/overview-frame.html: 9
>> /guide/6_6/format-of-solr-xml.html: 9
>> /guide/6_6/index.html: 9
>> /guide/6_6/learning-to-rank.html: 9
>> /guide/6_6/making-and-restoring-backups.html: 9
>> /guide/6_6/working-with-dates.html: 9
>> /guide/8_0/reindexing.html: 9
>> /docs/7_2_1/solr-solrj/allclasses-frame.html: 8
>> 
>> 

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

Re: Solr webpage SEO

2021-03-04 Thread Uwe Schindler
We can change the confluence redirects to use the url without Version number. 
The htaccess of Solr webpage redirects then automatically to latest version of 
refguide. This is done by the pelican variable on website deployment.

This link redirects automatically, so if we change the confluence redirects to 
use no version number it works fine:
https://solr.apache.org/guide/understanding-analyzers-tokenizers-and-filters.html
 (also with hashes!)

We can't do any dynamic tricks (does it exist?), as redirects can't be dynamic 
at Apache's Servers. We have no php or similar to implement a dynamic 404 page.

Uwe

Am March 4, 2021 10:41:52 AM UTC schrieb "Jan Høydahl" :
>Sure we could do robots.
>
>But I suspect that we put ourselves in this situation through
>https://issues.apache.org/jira/browse/SOLR-10595 ourselves
>Check out the attachment solr_redirects.conf on that JIRA (also here
>https://gist.github.com/janhoy/a3149e1ed27df020194a2de1a7fa2c16)
>
>Here, we explicitly map all pages that used to be in Confluence (which
>there are a ton of links to on the net), to version 6.6 of the guide.
>Of course, if we change those to "latest", some of those links will
>break, but perhaps it would still be better?
>Or can we be more intelligent in the rewrite rules on Solr site - that
>if you try a "/guide/foo.html" link and it is not found, that you
>display a custom error page or go to front page of latest guide?
>
>Jan
>
>> 4. mar. 2021 kl. 10:21 skrev Ishan Chattopadhyaya
>:
>> 
>> We can add robots.txt to stop Google from indexing/showing in
>results.
>> 
>> On Thu, 4 Mar, 2021, 2:34 pm Jan Høydahl, > wrote:
>> Hi, sending to this list since dev@solr list is not yet announced
>properly.
>> 
>> We have a few days of traffic to the new site and can see the most
>visited pages at https://uls.apache.org/exports/solr.apache.org.yaml
> (see copy below).
>> When I search google for "solr query parser", I get the 6.6 guide on
>top, which is probably why /guide/6_6/the-standard-query-parser.html
>shows up, and the same for the other /guide/6_6/ links. 
>> Some questions:
>> 
>> How can we make Google forget about version 6.6? I know we had a
>bunch of redirects from Confluence to the 6.6 guide, are they still in
>place?
>> Why is /docs/6_6_0/solr-core/index.html the 2nd most visited page?
>Anywhere that links to it?
>> Why is /docs/4_8_1/solr-solrj/index.html so high? Ahywhere that links
>to it?
>> The /mirrors-solr-latest-redir.html redirect was not working. I just
>pushed a fix
>> 
>> 
>> Sheet3:
>>   Name: Most visited pages, past month
>>   Values:
>> /index.html: 443
>> /docs/6_6_0/solr-core/index.html: 281
>> /guide/8_8/solr-tutorial.html: 104
>> /news.html: 92
>> /guide/solr-tutorial.html: 91
>> /resources.html: 75
>> /features.html: 69
>> /docs/8_7_0/solr-core/index.html: 68
>> /downloads.html: 68
>> /guide/6_6/the-standard-query-parser.html: 65
>> /docs/4_8_1/solr-solrj/index.html: 62
>> /guide/6_6/common-query-parameters.html: 50
>> /guide/index.html: 46
>> /docs/8_8_1/solr-solrj/index.html: 44
>> /docs/8_7_0/solr-solrj/index.html: 38
>> /docs/8_8_1/solr-core/index.html: 37
>> /community.html: 24
>> /guide/8_8/: 23
>> /guide/6_6/uploading-data-with-index-handlers.html: 22
>> /docs/8_6_3/solr-core/index.html: 21
>> /guide/6_6/filter-descriptions.html: 21
>> /guide/6_6/collections-api.html: 18
>> /docs/8_6_2/solr-solrj/overview-summary.html: 16
>> /guide/6_6/faceting.html: 16
>> /mirrors-solr-latest-redir.html: 15
>> /whoweare.html: 15
>> /guide/6_6/solrcloud.html: 14
>> /guide/6_6/tokenizers.html: 13
>> /guide/7_0/solr-configuration-files.html: 13
>> /guide/8_8/query-syntax-and-parsing.html: 13
>> /security.html: 13
>> /guide/6_6/introduction-to-solr-indexing.html: 12
>> /guide/8_8/solr-upgrade-notes.html: 12
>> /docs/8_0_0/solr-solrj/allclasses-frame.html: 11
>> /docs/8_6_3/solr-solrj/index.html: 11
>> /guide/6_6/the-dismax-query-parser.html: 11
>> /guide/8_8/getting-started.html: 11
>> /guide/solr-upgrade-notes.html: 11
>> /docs/8_0_0/solr-solrj/overview-summary.html: 10
>> /docs/8_6_2/solr-solrj/index.html: 10
>> /guide/6_6/running-solr.html: 10
>> /docs/7_2_1/solr-solrj/overview-summary.html: 9
>> /docs/8_0_0/solr-solrj/overview-frame.html: 9
>> /guide/6_6/format-of-solr-xml.html: 9
>> /guide/6_6/index.html: 9
>> /guide/6_6/learning-to-rank.html: 9
>> /guide/6_6/making-and-restoring-backups.html: 9
>> /guide/6_6/working-with-dates.html: 9
>> /guide/8_0/reindexing.html: 9
>> /docs/7_2_1/solr-solrj/allclasses-frame.html: 8
>> 
>> 

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

Re: Solr webpage SEO

2021-03-04 Thread Jan Høydahl
Sure we could do robots.

But I suspect that we put ourselves in this situation through 
https://issues.apache.org/jira/browse/SOLR-10595 ourselves
Check out the attachment solr_redirects.conf on that JIRA (also here 
https://gist.github.com/janhoy/a3149e1ed27df020194a2de1a7fa2c16)

Here, we explicitly map all pages that used to be in Confluence (which there 
are a ton of links to on the net), to version 6.6 of the guide.
Of course, if we change those to "latest", some of those links will break, but 
perhaps it would still be better?
Or can we be more intelligent in the rewrite rules on Solr site - that if you 
try a "/guide/foo.html" link and it is not found, that you display a custom 
error page or go to front page of latest guide?

Jan

> 4. mar. 2021 kl. 10:21 skrev Ishan Chattopadhyaya :
> 
> We can add robots.txt to stop Google from indexing/showing in results.
> 
> On Thu, 4 Mar, 2021, 2:34 pm Jan Høydahl,  > wrote:
> Hi, sending to this list since dev@solr list is not yet announced properly.
> 
> We have a few days of traffic to the new site and can see the most visited 
> pages at https://uls.apache.org/exports/solr.apache.org.yaml 
>  (see copy below).
> When I search google for "solr query parser", I get the 6.6 guide on top, 
> which is probably why /guide/6_6/the-standard-query-parser.html shows up, and 
> the same for the other /guide/6_6/ links. 
> Some questions:
> 
> How can we make Google forget about version 6.6? I know we had a bunch of 
> redirects from Confluence to the 6.6 guide, are they still in place?
> Why is /docs/6_6_0/solr-core/index.html the 2nd most visited page? Anywhere 
> that links to it?
> Why is /docs/4_8_1/solr-solrj/index.html so high? Ahywhere that links to it?
> The /mirrors-solr-latest-redir.html redirect was not working. I just pushed a 
> fix
> 
> 
> Sheet3:
>   Name: Most visited pages, past month
>   Values:
> /index.html: 443
> /docs/6_6_0/solr-core/index.html: 281
> /guide/8_8/solr-tutorial.html: 104
> /news.html: 92
> /guide/solr-tutorial.html: 91
> /resources.html: 75
> /features.html: 69
> /docs/8_7_0/solr-core/index.html: 68
> /downloads.html: 68
> /guide/6_6/the-standard-query-parser.html: 65
> /docs/4_8_1/solr-solrj/index.html: 62
> /guide/6_6/common-query-parameters.html: 50
> /guide/index.html: 46
> /docs/8_8_1/solr-solrj/index.html: 44
> /docs/8_7_0/solr-solrj/index.html: 38
> /docs/8_8_1/solr-core/index.html: 37
> /community.html: 24
> /guide/8_8/: 23
> /guide/6_6/uploading-data-with-index-handlers.html: 22
> /docs/8_6_3/solr-core/index.html: 21
> /guide/6_6/filter-descriptions.html: 21
> /guide/6_6/collections-api.html: 18
> /docs/8_6_2/solr-solrj/overview-summary.html: 16
> /guide/6_6/faceting.html: 16
> /mirrors-solr-latest-redir.html: 15
> /whoweare.html: 15
> /guide/6_6/solrcloud.html: 14
> /guide/6_6/tokenizers.html: 13
> /guide/7_0/solr-configuration-files.html: 13
> /guide/8_8/query-syntax-and-parsing.html: 13
> /security.html: 13
> /guide/6_6/introduction-to-solr-indexing.html: 12
> /guide/8_8/solr-upgrade-notes.html: 12
> /docs/8_0_0/solr-solrj/allclasses-frame.html: 11
> /docs/8_6_3/solr-solrj/index.html: 11
> /guide/6_6/the-dismax-query-parser.html: 11
> /guide/8_8/getting-started.html: 11
> /guide/solr-upgrade-notes.html: 11
> /docs/8_0_0/solr-solrj/overview-summary.html: 10
> /docs/8_6_2/solr-solrj/index.html: 10
> /guide/6_6/running-solr.html: 10
> /docs/7_2_1/solr-solrj/overview-summary.html: 9
> /docs/8_0_0/solr-solrj/overview-frame.html: 9
> /guide/6_6/format-of-solr-xml.html: 9
> /guide/6_6/index.html: 9
> /guide/6_6/learning-to-rank.html: 9
> /guide/6_6/making-and-restoring-backups.html: 9
> /guide/6_6/working-with-dates.html: 9
> /guide/8_0/reindexing.html: 9
> /docs/7_2_1/solr-solrj/allclasses-frame.html: 8
> 
> 



Serializatio/Deserialization of Lucene objects like queries, sort fields etc

2021-03-04 Thread jitesh129
Hello All,

Recently we upgraded our Lucene core libraries from v2.9.4 to v8.7.0 and as
all the lucene classes have been changed to non-serializable and since we
are using RPC for invoking the search queries we ran into bunch of
NotSerializableException.

I went through various forums which suggest either to use toString() method
of queries and then use query parser at the receiver end to convert it back
to Query objects. This fixed the NotSerializableException issue but the
behaviour of queries and filters were not correct now. While looking into
these issues we identified that this could be because of toString and query
parising not returning the equivalent query objects.

Hence we again started looking for other serialization options and got a
reference of using Kryo serializers for the same purpose. But using Kryo
serializers we are running into buffer overflow and some time running into
ClassCastException for BooleanClause$Occur.

Could someone please point me towards correct way of serialization and
deserialization of Lucene objects.



--
Sent from: 
https://lucene.472066.n3.nabble.com/Lucene-Java-Developer-f564358.html

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Solr webpage SEO

2021-03-04 Thread Ishan Chattopadhyaya
We can add robots.txt to stop Google from indexing/showing in results.

On Thu, 4 Mar, 2021, 2:34 pm Jan Høydahl,  wrote:

> Hi, sending to this list since dev@solr list is not yet announced
> properly.
>
> We have a few days of traffic to the new site and can see the most visited
> pages at https://uls.apache.org/exports/solr.apache.org.yaml (see copy
> below).
> When I search google for "solr query parser", I get the 6.6 guide on top,
> which is probably why /guide/6_6/the-standard-query-parser.html shows up,
> and the same for the other /guide/6_6/ links.
> Some questions:
>
>
>- How can we make Google forget about version 6.6? I know we had a
>bunch of redirects from Confluence to the 6.6 guide, are they still in
>place?
>- Why is /docs/6_6_0/solr-core/index.html the 2nd most visited page?
>Anywhere that links to it?
>- Why is /docs/4_8_1/solr-solrj/index.html so high? Ahywhere that
>links to it?
>- The /mirrors-solr-latest-redir.html redirect was not working. I just
>pushed a fix
>
>
>
> Sheet3:
>   Name: Most visited pages, past month
>   Values:
> /index.html: 443
> /docs/6_6_0/solr-core/index.html: 281
> /guide/8_8/solr-tutorial.html: 104
> /news.html: 92
> /guide/solr-tutorial.html: 91
> /resources.html: 75
> /features.html: 69
> /docs/8_7_0/solr-core/index.html: 68
> /downloads.html: 68
> /guide/6_6/the-standard-query-parser.html: 65
> /docs/4_8_1/solr-solrj/index.html: 62
> /guide/6_6/common-query-parameters.html: 50
> /guide/index.html: 46
> /docs/8_8_1/solr-solrj/index.html: 44
> /docs/8_7_0/solr-solrj/index.html: 38
> /docs/8_8_1/solr-core/index.html: 37
> /community.html: 24
> /guide/8_8/: 23
> /guide/6_6/uploading-data-with-index-handlers.html: 22
> /docs/8_6_3/solr-core/index.html: 21
> /guide/6_6/filter-descriptions.html: 21
> /guide/6_6/collections-api.html: 18
> /docs/8_6_2/solr-solrj/overview-summary.html: 16
> /guide/6_6/faceting.html: 16
> /mirrors-solr-latest-redir.html: 15
> /whoweare.html: 15
> /guide/6_6/solrcloud.html: 14
> /guide/6_6/tokenizers.html: 13
> /guide/7_0/solr-configuration-files.html: 13
> /guide/8_8/query-syntax-and-parsing.html: 13
> /security.html: 13
> /guide/6_6/introduction-to-solr-indexing.html: 12
> /guide/8_8/solr-upgrade-notes.html: 12
> /docs/8_0_0/solr-solrj/allclasses-frame.html: 11
> /docs/8_6_3/solr-solrj/index.html: 11
> /guide/6_6/the-dismax-query-parser.html: 11
> /guide/8_8/getting-started.html: 11
> /guide/solr-upgrade-notes.html: 11
> /docs/8_0_0/solr-solrj/overview-summary.html: 10
> /docs/8_6_2/solr-solrj/index.html: 10
> /guide/6_6/running-solr.html: 10
> /docs/7_2_1/solr-solrj/overview-summary.html: 9
> /docs/8_0_0/solr-solrj/overview-frame.html: 9
> /guide/6_6/format-of-solr-xml.html: 9
> /guide/6_6/index.html: 9
> /guide/6_6/learning-to-rank.html: 9
> /guide/6_6/making-and-restoring-backups.html: 9
> /guide/6_6/working-with-dates.html: 9
> /guide/8_0/reindexing.html: 9
> /docs/7_2_1/solr-solrj/allclasses-frame.html: 8
>
>
>


Solr webpage SEO

2021-03-04 Thread Jan Høydahl
Hi, sending to this list since dev@solr list is not yet announced properly.

We have a few days of traffic to the new site and can see the most visited 
pages at https://uls.apache.org/exports/solr.apache.org.yaml (see copy below).
When I search google for "solr query parser", I get the 6.6 guide on top, which 
is probably why /guide/6_6/the-standard-query-parser.html shows up, and the 
same for the other /guide/6_6/ links. 
Some questions:

How can we make Google forget about version 6.6? I know we had a bunch of 
redirects from Confluence to the 6.6 guide, are they still in place?
Why is /docs/6_6_0/solr-core/index.html the 2nd most visited page? Anywhere 
that links to it?
Why is /docs/4_8_1/solr-solrj/index.html so high? Ahywhere that links to it?
The /mirrors-solr-latest-redir.html redirect was not working. I just pushed a 
fix


Sheet3:
  Name: Most visited pages, past month
  Values:
/index.html: 443
/docs/6_6_0/solr-core/index.html: 281
/guide/8_8/solr-tutorial.html: 104
/news.html: 92
/guide/solr-tutorial.html: 91
/resources.html: 75
/features.html: 69
/docs/8_7_0/solr-core/index.html: 68
/downloads.html: 68
/guide/6_6/the-standard-query-parser.html: 65
/docs/4_8_1/solr-solrj/index.html: 62
/guide/6_6/common-query-parameters.html: 50
/guide/index.html: 46
/docs/8_8_1/solr-solrj/index.html: 44
/docs/8_7_0/solr-solrj/index.html: 38
/docs/8_8_1/solr-core/index.html: 37
/community.html: 24
/guide/8_8/: 23
/guide/6_6/uploading-data-with-index-handlers.html: 22
/docs/8_6_3/solr-core/index.html: 21
/guide/6_6/filter-descriptions.html: 21
/guide/6_6/collections-api.html: 18
/docs/8_6_2/solr-solrj/overview-summary.html: 16
/guide/6_6/faceting.html: 16
/mirrors-solr-latest-redir.html: 15
/whoweare.html: 15
/guide/6_6/solrcloud.html: 14
/guide/6_6/tokenizers.html: 13
/guide/7_0/solr-configuration-files.html: 13
/guide/8_8/query-syntax-and-parsing.html: 13
/security.html: 13
/guide/6_6/introduction-to-solr-indexing.html: 12
/guide/8_8/solr-upgrade-notes.html: 12
/docs/8_0_0/solr-solrj/allclasses-frame.html: 11
/docs/8_6_3/solr-solrj/index.html: 11
/guide/6_6/the-dismax-query-parser.html: 11
/guide/8_8/getting-started.html: 11
/guide/solr-upgrade-notes.html: 11
/docs/8_0_0/solr-solrj/overview-summary.html: 10
/docs/8_6_2/solr-solrj/index.html: 10
/guide/6_6/running-solr.html: 10
/docs/7_2_1/solr-solrj/overview-summary.html: 9
/docs/8_0_0/solr-solrj/overview-frame.html: 9
/guide/6_6/format-of-solr-xml.html: 9
/guide/6_6/index.html: 9
/guide/6_6/learning-to-rank.html: 9
/guide/6_6/making-and-restoring-backups.html: 9
/guide/6_6/working-with-dates.html: 9
/guide/8_0/reindexing.html: 9
/docs/7_2_1/solr-solrj/allclasses-frame.html: 8




Separate Solr build: help with the remaining last mile needed.

2021-03-04 Thread Dawid Weiss
Hi folks,

I'll need some help with some remaining tasks to make the transition
easier after the solr repo split. I've made some changes to allow
building just Solr or just Lucene on master --

https://github.com/apache/lucene-solr/pull/2448

1. I really don't know much about docker testing and why it fails on that PR.

2. Lucene build seems to work just fine.

3. You can build Solr independently (with a Lucene snapshot from
Apache repository) by running:

./gradlew -Dskip.lucene=true assemble check -x test -x documentation
-x checkBrokenLinks -x checkLocalJavadocLinksSite

the "-x" tasks are what I'll need some help with - I guess they do
have cross-references to Lucene-generated stuff that needs to be
replaced (or dropped).

4. There are three tests that need to be removed from the codebase,
moved to Lucene or rewritten: TestICUCollationField,
TestLuceneIndexBackCompat and TestXmlQParser. These tests require
access to Lucene test sources and these won't be published as an
artifact.

An alternative to trying to solve the above is to just split the repo
and let it burn/ crash, but I thought by fixing those issues on
current master branch we can prepare the infrastructure while
everything else just works.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org