Re: using HttpSolrServer with PoolingHttpClientConnectionManager

2017-03-01 Thread Renee Sun
Thank you Shawn! this is very helpful.

Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-HttpSolrServer-with-PoolingHttpClientConnectionManager-tp4322905p4322972.html
Sent from the Solr - User mailing list archive at Nabble.com.


using HttpSolrServer with PoolingHttpClientConnectionManager

2017-03-01 Thread Renee Sun
first of all I apologize for the length of this message ... there are few
questions I would appreciate your help please:

1. originally I wanted to use solrj in my application layer (webapp deployed
with tomcat), to query the solr server(s) with multi-cores, non-cloud setup.

Since I need send back XML format to my client, I realize it is not an use
case for solrj, so I should abandon the idea (correct?)

2. I also looked into CommonsHttpSolrServer trying to query solr directly,
which supposedly allow me to set XMLResponseParser as  ResponseParser. 
however, it seems CommonsHttpSolrServer is deprecated, with httpclient 4.x I
think I should use HttpSolrServer. I do need to have a way to set the
returned data in xml format, and I want to use pooled http conn manager to
support multiple thread for queries. I thought I could do all this with
HttpSolrServer, (yes?) as below:

PoolingHttpClientConnectionManager connManager = new
PoolingHttpClientConnectionManager(); 
connManager.setMaxTotal(5);
connManager.setDefaultMaxPerRoute(4);
... ...
CloseableHttpClient httpclient =
HttpClients.custom().setConnectionManager(connManager).build();
... ...
ResponseParser parser = new XMLResponseParser();
... ...
HttpSolrServer server = new HttpSolrServer(myUrl, httpclient, parser);

... ... 
SolrQuery query = new SolrQuery();
query.setQuery(q);
query.setParam("wt", "xml"); // not needed?
... ...
QueryResponse response = server.query(query);
SolrDocumentList sdl = response.getResults();

at this point will the documents in sdl be in xml format if I use toString()
looping through them? will there be overhead if this works at all? will
solrj skip the xml parsing and simply return the results as I requested xml
parser?

I somehow feel its very fishy and I could be better off just not use solrj ?
what is the best practice here?

3. I think my next question could be more like a httpclient question, but it
does relate to solr / cores, so I will hope someone can give me help here:

when I try to config PoolingHttpClientConnectionManager, for the per route
connection etc, will the following different url considered to be different
routes, or since they hit the same server, it will ignore the
collection/core part?

String myUrl = "http://localhost:8983/solr/core1";;

and

String myUrl = "http://localhost:8983/solr/core2";;


Thanks!
Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-HttpSolrServer-with-PoolingHttpClientConnectionManager-tp4322905.html
Sent from the Solr - User mailing list archive at Nabble.com.


is there a way to match related multivalued fields of different types

2017-02-08 Thread Renee Sun
Hi -
I have a schema looks like:





(text_nost and text_st are just defined field type without/with stopwords...
irrelevant to the issues here)

these 3 fields are parallel in means of their values. I want to be able to
match these values and be able to search something like :

give me all attachment_names if their corresponding attachment_size > 5000

I googled and saw someone mentioned about using dynamic fields, but I think
dynamic fields are more suitable for 'type' style value rather than what I
am having is attachment_names being just individual values.

Please advice what is the best way to achieve this.
Thanks in advance!
Renee 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-a-way-to-match-related-multivalued-fields-of-different-types-tp4319342.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: project related configsets need to be deployed in both data and solr install folders ?

2017-02-01 Thread Renee Sun
thanks for your time!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/project-related-configsets-need-to-be-deployed-in-both-data-and-solr-install-folders-tp4317897p4318382.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: project related configsets need to be deployed in both data and solr install folders ?

2017-02-01 Thread Renee Sun
Hi Chris,
since I have been playing with this install, and I am not certain if I have
unknowingly messed some other settings. I want to avoid put in a false Jira
wasting your time. 

I wiped out everything on my solr box and did  a fresh install of solr
6.4.0, made sure my config file set are placed in the data folder (
/myprojectdata/solr/data/configsets/myproject_configs ).  My solr home is
set to  /myprojectdata/solr/data , it is WORKING now.

I did not have to specify configSetBaseDir in the solr.xml (its in the data
folder  /myprojectdata/solr/data/solr.xml, NOT the one in install folder
/opt/solr/server/solr/solr.xml), the default correctly point at the solr
home which is my data folder, and find the config file set.

So there is no problem, everything works fine, I can create new core without
any issue. There is no bug whatsoever.

Thank you for all your help!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/project-related-configsets-need-to-be-deployed-in-both-data-and-solr-install-folders-tp4317897p4318369.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: project related configsets need to be deployed in both data and solr install folders ?

2017-01-31 Thread Renee Sun
Thanks Erick!

I looked at solr twiki though if configSetBaseDir is not set, the default
should be SOLR_HOME/configsets:

configSetBaseDir

The directory under which configsets for solr cores can be found. 
Defaults
to SOLR_HOME/configsets

and I do have my solr started with :

-Dsolr.solr.home=/myprojectdata/solr/data

I also deploy my config into:

/myprojectdata/solr/data/configsets/myproject_configs

anyways, looks like the default is not working?

I found this https://issues.apache.org/jira/browse/SOLR-6158, which seems to
talk about the configSetBaseDir issue ...

I do set configSetBaseDir in solr.xml and it works now. Just wonder why the
default wont work. Or I might did something else wrong.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/project-related-configsets-need-to-be-deployed-in-both-data-and-solr-install-folders-tp4317897p4318163.html
Sent from the Solr - User mailing list archive at Nabble.com.


project related configsets need to be deployed in both data and solr install folders ?

2017-01-30 Thread Renee Sun
Hi -

We use separate solr install and data folders with a shared schema/config
(configsets) in multi-cores setup, it seems the configsets need to be
deployed in both places (we are running solr 6.4.0)?

for example, 

solr is installed in /opt/solr, thus there is folder:

/opt/solr/server/solr/configsets

we separate the data into a different partition, thus there is:

/mysolrdata/solr/data/configsets

At first, I only deployed the project configsets to the solr install folder
/opt/solr/server/solr/configsets :

/opt/solr/server/solr/configsets/myproject_configs

then when I create a core, solr complains it could not load config from
/mysolrdata/solr/data/configsets/myproject_configs (the data folder):

curl
'http://localhost:8983/solr/admin/cores?action=CREATE&name=abc&instanceDir=abc&configSet=myproject_configs'


40014org.apache.solr.common.SolrExceptionorg.apache.solr.common.SolrExceptionError CREATEing SolrCore 'abc': Unable to create core [abc]
Caused by: Could not load configuration from directory
/mysolrdata/solr/data/configsets/myproject_configs400


So next, I moved the configs to /mysolrdata/solr/data/configsets, but it now
complains it could not load config from the install folder
/opt/solr/server/solr/configsets/myproject_configs with the same error.

I had to copy the same config set to both folders ( I eventually did a
symlink from the /opt/solr/server/solr/configsets/myproject_configs to
/mysolrdata/solr/data/configsets), and it worked.

I wonder if I have missed any settings to allow me only deploy the configset
at one place: either my data folder or the install folder, I would assume
the benefit is obvious.

Thanks
Renee






--
View this message in context: 
http://lucene.472066.n3.nabble.com/project-related-configsets-need-to-be-deployed-in-both-data-and-solr-install-folders-tp4317897.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr 5 leaving tomcat, will I be the only one fearing about this?

2016-10-10 Thread Renee Sun
Thanks John... yes that was the first idea came to our mind, but it will
require doubling our servers (in replica data centers as well etc),
definitely we can't afford the cost.

We have thought of first establishing a small pool of 'hot' servers and use
them to take incoming new index data using upgraded solr version (a relative
much smaller resources pool), meanwhile take one exist server (and its
replicas as well) at one time to upgrade one by one. Although most (99%)
index will happen at the small hot servers pool, but there are still some of
the updates to the 'cold' servers at all time. We will also need to
introduce a write lock down on the impacted servers... with one server at a
time, the scope of impact will be reduced to its minimum... 

I am pretty sure we must not be the only one that has to face the re-index
issue with large data set... am I correct? If there is a better approach,
please share...

thanks a lot!
Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-fearing-about-this-tp4300065p4300550.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr 5 leaving tomcat, will I be the only one fearing about this?

2016-10-10 Thread Renee Sun
Shawn and Ari,
the 3rd party jars are exactly just one of the concerns I have.
We had more than just a multi-lingual integration, we have to integrate with
many other 3rd party tools. We basically deploy all those jars into an
'external' lib extension path in production, then for each 3rd party tools
involved, we have to follow their instruction to integrate with solr/tomcat
by symlink them into either tomcat/lib or ourwebapps/WEB-INF/lib etc. We
will need to rewrite these integration in the build process I imagine.

I am sure there will be a lot of other work if we go for this 'upgrade' ...
I bet we will need to re-index the data ... for each major solr version
upgrade (like from 3.0 to 4.0) the data needs to be re-indexed and this is
another huge concern. 

Our data is not that BIG considering what others have nowadays, but still
few hundreds of terabytes are  a pain in the neck to go through re-index
process, resource wise and time wise. Almost a road block.

I have tested that solr shard query works with querying data from solr
servers with different versions, so we could upgrade our production solr
servers and reindex data on them one by one... but still, it is almost a
non-practical thing to do, similar to how Ari feels, I also would rather not
upgrade ... 

I guess we fortunately started to use Solr long time ago (which benefited
our system of its key feature to search for email content in our SAAS
services), but on the other side we become so depending on the old version
of solr, as a reality with how Solr evolves, it is so hard for us to keep up
with these upgrades...

few years ago, the scalability become a major issue in our system, I did a
lot of experiments with the Solr 4.0, unfortunately by then the lack of
supporting of multi-tenancy as well as other fundamental flaws drove it out
of our choices, so our team ended up with developing a light weighted
scalalible layer wrapping up on top of solr, which worked very well but here
we are... from all elements: build process, architect, data migration etc
 it is scary.

Good discussion and it is great help ... :-)
Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-fearing-about-this-tp4300065p4300523.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr 5 leaving tomcat, will I be the only one fearing about this?

2016-10-07 Thread Renee Sun
I just read through the following link Shawn shared in his reply:
https://wiki.apache.org/solr/WhyNoWar

While the following statement is true:

"Supporting a single set of binary bits is FAR easier than worrying
about what kind of customized environment the user has chosen for their
deployment. "

But it also probably will reduce the flexibility... for example, we tune for
Scalability at tomcat level, such as its thread pool etc.  I assume the
standalone Solr (which is still using Jetty underlying) would expose
sufficient configurable 'knobs' that allow me to turn 'Solr' to meet our
data work load.

If we want to minimize the migration work, our existing business logic
component will remain in tomcat, then the fact that we will have co-exist
jetty and tomcat deployed in production system is a bit strange... or is it? 

Even if I could port our webapps to use Jetty, I assume the way solr is
embedding Jetty I would be able to integrate at that level, I probably end
up with 2 Jetty container instances running on same server, correct? It is
still too early for me to be sure how this will impact our system but I am a
little worried.

Renee 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-fearing-about-this-tp4300065p4300259.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr 5 leaving tomcat, will I be the only one fearing about this?

2016-10-07 Thread Renee Sun
Thanks everyone, I think this is very helpful... I will post more specific
questions once we start to get more familiar with solr 6.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-fearing-about-this-tp4300065p4300253.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr 5 leaving tomcat, will I be the only one fearing about this?

2016-10-07 Thread Renee Sun
Thanks ... but that is an extremely simplified situation.

We are not just looking for Solr as a new tool to start using it.

In our production, we have cloud based big data indexing using Solr for many
years. We have developed lots business related logic/component deployed as
webapps working seamlessly with solr.

I will give you a simple example, we purchased multi-lingual processors (and
many other 3rd parties) which we integrated with solr by carefully deploy
the libraries (e.g.) in the tomcat container so they work together. This
basically means we have to rewrite all those components to make it work with
solr 5 or 6. 

In my opinion, for those solr users like our company, it will really be
beneficial if Solr could keep supporting deploying a war and maintain
parallel support with its new standalone release, although this might be too
much work? 

Thanks 
Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-fearing-about-this-tp4300065p4300202.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr 5 leaving tomcat, will I be the only one fearing about this?

2016-10-06 Thread Renee Sun
need some general advises please...

our infra is built with multiple webapps with tomcat ... the scale layer is
archived on top of those webapps which work hand-in-hand with solr admin
APIs / shard queries / commit or optimize / core management etc etc.

While I have not get a chance to actually play with solr 5 yet, just by
imagination, we will be facing some huge changes in our infra to be able to
upgrade to solr 5, yes? 

Thanks
Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-5-leaving-tomcat-will-I-be-the-only-one-fearing-about-this-tp4300065.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to efficiently get sum of an int field

2015-11-05 Thread Renee Sun
thanks Yonik... I bet with solr 3.5 we do not have jason facet api support
yet ...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464p4238522.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to efficiently get sum of an int field

2015-11-05 Thread Renee Sun
Also Yonik, out of curiosity... when I run stats on a large msg set (such as
200 million msgs), it tends to use a lot of memory, this should be expected
correct?

if I were able to use !sum=true to only get sum, a clever algorithm should
be able to tell if sum is only requited, it will avoid memory overhead, is
that implemented so ?

anyways I was only trying to avoid running these stats on thousands
customers that kills our solr servers.

thanks
Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464p4238520.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to efficiently get sum of an int field

2015-11-05 Thread Renee Sun
now I think with solr 3.5 (that we are using), !sum=true (overwrite default )
probably is not supported yet :-(

thanks
Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464p4238519.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to efficiently get sum of an int field

2015-11-05 Thread Renee Sun
I did try single quote with backslash of the bang.
also tried disable history chars... 

did not work for me.

unfortunately, we are using solr 3.5, probably does not support json format?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464p4238497.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to efficiently get sum of an int field

2015-11-05 Thread Renee Sun
thanks!

but it is silly that I can seem to escape the {!sum=true} properly to make
it work in my curl :-(

 time curl -d
'q=*:*&rows=0&shards=solrhostname:8080/solr/413-1,anothersolrhost:8080/solr/413-2&stats=true&stats.field={!sum=true}myfieldname'
http://localhost:8080/solr/413-1/select/? | xmllint --format -

double quote or single quote, only escape ! or escape all { and !, nothing
will make it work. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464p4238478.html
Sent from the Solr - User mailing list archive at Nabble.com.


how to efficiently get sum of an int field

2015-11-05 Thread Renee Sun
Hi -
I have been using stats to get the sum of a field data (int) like:

&stats=true&stats.field=my_field_name&rows=0

It works fine but when the index has hundreds million messages on a sharded
indices, it take long time.

I noticed the 'stats' give out more information than I needed (just sum), I
suspect the min/max/mean etc are the ones that caused the time. 

Is there a simple way I can just get the sum without other things, and run
it on a faster and less stressed to the solr server manner?

Thanks
Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-efficiently-get-sum-of-an-int-field-tp4238464.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: any easy way to find out when a core's index physical file has been last updated?

2015-09-04 Thread Renee Sun
Thanks a lot Shawn, for the details, it is very helpful !






--
View this message in context: 
http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227274.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: any easy way to find out when a core's index physical file has been last updated?

2015-09-04 Thread Renee Sun
Shawn, thanks so much, and this user forum is so helpful!

I will start use autocommit with confidence it will greatly help reducing
the false commit requests (a lot) from processes in our system.

Regarding the solr version, it is actually a big problem we have to resolve
sooner or later.

When we upgraded to Solr 3.5 about 2 years ago, to avoid re-index our large
data, we used :

LUCENE_29

which seems to work fine except a lot of such warnings in catalina.out:

WARNING: StopFilterFactory is using deprecated LUCENE_29 emulation. You
should at some point declare and reindex to at least 3.0, because 2.x
emulation is deprecated and will be removed in 4.0

We have a built a infrastructure which scales well using solr, is it a good
practice to upgrade to solr 4.x without using solrCloud if it is possible at
all?

thanks!
Renee 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227220.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: any easy way to find out when a core's index physical file has been last updated?

2015-09-03 Thread Renee Sun
unfortunately we are still using solr 3.5 with lucene 2.9.3 :-( If we upgrade
to solr 4.x it will require upgrade of lucene away from 2.x.x which will
need re-index of all our data. With current measures, it might take about
8-9 for the data we have to be re-indexed, a big concern.

so to understand autocommit better, since it says:

  30 

I want to know 

1) if I have a batch of 2000 documents being added to index, it may span of
3 minutes to index all 2000 document. Will the autocommit defined above kick
off a commit 5 minutes after the first of 2000 document being indexed? 

2) the autocommit will NOT commit if there is no update in last 5 minutes? 

3) will maxTime counts in the document deletion or it only cares about
adding a document?  In another word, should I use maxPendingDeletes for
document deletion?

thanks
Renee




--
View this message in context: 
http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227132.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: any easy way to find out when a core's index physical file has been last updated?

2015-09-03 Thread Renee Sun
thank you! I will look into that.

Also I came across autosoftcommit, it seems to be useful... we are still
using solr 3.5, I hope autosoftcommit is included in solr 3.5...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227098.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: any easy way to find out when a core's index physical file has been last updated?

2015-09-03 Thread Renee Sun
Walter, thanks! 

I will do some tests using auto commit, I guess if there is requirement for
console UI to make documents searchable in 10 minutes, we will need to use
the autocommit with maxTime instead of maxDoc.

I wonder if in case we need to do a 'force commit', the autocommit will not
get in the way by its not yet its maxTime, as long as there are updates?

thanks
Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227091.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: any easy way to find out when a core's index physical file has been last updated?

2015-09-03 Thread Renee Sun
this make sense now. Thanks!

why I got on this idea is:

In our system we have large customer base and lots of cores, each customer
may have multiple cores.

there are also a lot of processes running in our system processing the data
for these customers, and once a while, they would ask a center piece of
webapp that we wrote to commit on a core.

In this center piece webapp, I deploy it with solr in same tomcat container,
its task is mainly a wrapper around the local cores to manage monitoring of
the core size, merge cores if needed etc. I also have controls over the
commit requests this webapp receives from time to time, try to space the
commit out. In the case where multiple processes asking commits to the same
core , my webapp will guarantee only one commit in x mintues interval get
executed and drop the other commit requests.

Now I just discovered some of the processes send in large amount of commit
requests on many cores which never had any changes in the last interval.
This was due to a bug in those other processes but the programmers there are
behind on fixing the issue. this triggers me to the idea of verifying the
incoming commit requests by checking the physical index files to see if any
updates really occurred in the last interval.

I was searching for any solr core admin RESTful api to get some meta data
about the core such as 'last modified timestamp' ... but did not have any
luck. 

I thought I could use 'index' folder timestamp to get accurate last modified
time, but with what you just explained, it would not be the case. I will
have to traverse through the files in the folder and figure out the last
modified file.

any input will be appreciated. Thanks a lot!
Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227084.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: any easy way to find out when a core's index physical file has been last updated?

2015-09-03 Thread Renee Sun
[core]/index is a folder holding index files.

But index files in that folder is not just being deleted or added, they are
also being updated.

on Linux file system, the folder's timestamp will only be updated if the
files in it is being added or deleted, NOT updated.  So if I check the index
folder, it will not be accurately reflexing the last time the index files
are updated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227058.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: any easy way to find out when a core's index physical file has been last updated?

2015-09-03 Thread Renee Sun
hum... at beginning I also assumed segment index files will only be deleted
or added, but not modified.

But I did a test with heavy indexing on going, and observed the index file
in [core]/index with a latest updated timestamp keep growing for about 7
minutes... not sure if the new write caused any merge and the file being
updated has pretty big size, so it could be merging...  but that does mean
index files can be modified.

thanks
Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044p4227049.html
Sent from the Solr - User mailing list archive at Nabble.com.


any easy way to find out when a core's index physical file has been last updated?

2015-09-03 Thread Renee Sun
I will need to figure out when was last index activity on a core. 

I can't use [corename]/index timestamp, because it only reflex the file
deletion or addition, not file update.

I am curious if any solr core admin RESTful api sort of thing thing I can
use to get last modified timestamp on physical index ...

Thanks
Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/any-easy-way-to-find-out-when-a-core-s-index-physical-file-has-been-last-updated-tp4227044.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: is there any way to tell delete by query actually deleted anything?

2015-09-02 Thread Renee Sun
thanks Shawn...

on the other side, I have just created a thin layer webapp I deploy it with
solr/tomcat. this webapp provides RESTful api allow all kind of clients in
our system to call and request a commit on the certain core on that solr
server.

I put in with the idea to have a centre/final place to control the commit on
the cores in local solr server.

so far it works by reducing the arbitrary requests, such as that I will not
allow 2 commit requests from different clients to commit on same core happen
too close to each other, I will disregard the second request if the first
just being done like less than 5 minutes ago.

I am think enhance this webapp to check on physical index dir timestamp, and
drop the request if the core has not been changed since last commit. This
will prevent the client trying to commit on all cold cores blindly where
only one of them actually was updated.

I mean to ask: is there any solr admin meta data I can fetch through restful
api, to get data such as index last updated time, or something like that?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-any-way-to-tell-delete-by-query-actually-deleted-anything-tp4226776p4226818.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: is there any way to tell delete by query actually deleted anything?

2015-09-02 Thread Renee Sun
Hi Erick... as Shawn pointed out... I am not using solrcloud, I am using a
more complicated sharding scheme, home grown... 

thanks for your response :-)
Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-any-way-to-tell-delete-by-query-actually-deleted-anything-tp4226776p4226806.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: is there any way to tell delete by query actually deleted anything?

2015-09-02 Thread Renee Sun
Hi Shawn,
I think we have similar structure where we use frontier/back instead of
hot/cold :-)

so yes we will probably have to do the same.

since we have large customers and some of them may have tera bytes data and
end up with hundreds of cold cores the blind delete broadcasting to all
of them is a performance kill.

I am thinking of adding a in-memory inventory of coreID : docID  so I can
identify which core the document is in efficiently... what do you think
about it?

thanks
Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-any-way-to-tell-delete-by-query-actually-deleted-anything-tp4226776p4226805.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: is there any way to tell delete by query actually deleted anything?

2015-09-02 Thread Renee Sun
Shawn,
thanks for the reply.

I have a sharded index. When I re-index a document (vs new index, which is
different process), I need to delete the old one first to avoid dup. We all
know that if there is only one core, the newly added document will replace
the old one, but with multiple core indexes, we will have to issue delete
command first to ALL shards since we do NOT know/remember which core the old
document was indexed to ... 

I also wanted to know if there is a better way handling this efficiently.

Anyways, we are sending delete to all cores of this customer, one of them
hit , others did not.

But consequently, when I need to decide about commit, I do NOT want blindly
commit to all cores, I want to know which one actually had the old doc so I
only send commit to that core.

I could alternatively use query first and skip if it did not hit, but delete
if it does, and I can't short circuit since we have dups :-( based on a
historical reason. 

any suggestion how to make this more efficiently?
 
thanks!






--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-any-way-to-tell-delete-by-query-actually-deleted-anything-tp4226776p4226788.html
Sent from the Solr - User mailing list archive at Nabble.com.


is there any way to tell delete by query actually deleted anything?

2015-09-02 Thread Renee Sun
I run this curl trying to delete some messages :

curl
'http://localhost:8080/solr/mycore/update?commit=true&stream.body=abacd'
| xmllint --format -

or

curl
'http://localhost:8080/solr/mycore/update?commit=true&stream.body=myfield:mycriteria'
| xmllint --format -

the results I got is like:

  % Total% Received % Xferd  Average Speed   TimeTime Time 
Current
 Dload  Upload   Total   SpentLeft 
Speed
148   1480   1480 0  11402  0 --:--:-- --:--:-- --:--:--
14800


  
0
10
  


Is there an easy way for me to get the actually deleted document number? I
mean if the query did not hit any documents, I want to know that nothing got
deleted. But if it did hit documents, i would like to know how many were
delete...

thanks
Renee



--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-any-way-to-tell-delete-by-query-actually-deleted-anything-tp4226776.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-15 Thread Renee Sun
sorry I should elaborate that earlier...

in our production environment, we have multiple cores and the ingest
continuously all day long; we only do optimize periodically, and optimize
once a day in mid night.

So sometimes we could see 'too many open files' error. To prevent it from
happening, in production we maintain a script to monitor the segment files
total with all cores, and send out warnings if that number exceed a
threshold... it is kind of preventive measurement.  Currently we are using
the linux command to count the files. We are wondering if we can simply use
some formula to figure out this number, it will be better that way. Seems we
could use the stat url to get segment number and multiply it by 8 (that is
what we have given our schema).

Any better way to approach this? thanks a lot!
Renee

--
View this message in context: 
http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2825736.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-15 Thread Renee Sun
yeah, I can figure out the segment number by going to stat page of solr...
but my question was how to figure out exact total number of files in 'index'
folder for each core.

Like I mentioned in previous message, I currently have 8 files per segment
(.prx .tii etc), but it seems this might change if I use term vector for
example.  So I need suggestions on how to accurately figure out the total
file number.

thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2817912.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-15 Thread Renee Sun
thanks! 

It seems the file count in index directory is the segment# * 8 in my dev
environment...

I see there are .fnm .frq .fdt .fdx .nrm .prx .tii .tis (8) file extensions,
and each has as many as segment# files.

Is it always safe to calculate the file counts using segment number multiply
by 8? of course this excludes the segment_N, segment.gen and xxx_del files.

I found most of the cores has the file count that can be calculated just
using above formula, but few cores do not have a match number... 

thanks
Renee

--
View this message in context: 
http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2813419.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-12 Thread Renee Sun
ok I dug more into this and realize the file extensions can vary depending on
schema, right?
for instance we dont have *.tvx, *.tvd, *.tvf (not using term vector)... and
I suspect the file extensions
may change with future lucene releases?

now it seems we can't just count the file using any formula, we have to list
all files in that directory and count that way... any insight will be
appreciated.
thanks
Renee

--
View this message in context: 
http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2813561.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-12 Thread Renee Sun
Hi Hoss,
thanks for your response...

you are right I got a typo in my question, but I did use maxSegments, and
here is the exactly url I used:

 curl
'http://localhost:8080/solr/97/update?optimize=true&maxSegments=10&waitFlush=true'

I used jconsole and du -sk to monitor each partial optimize, and I am sure
the optimize was done and
it always reduce segment files from 130+ to 65+ when I started with
maxSegments=10; when I run
again with maxSegments=9, it reduce to somewhere in 50.

when I use maxSegments=2, it always reduce the segment to 18; and
maxSegments=1 (full optimize)
will always reduce the core to 10 segment files.

this has been repeated for about dozen times.

I think the resulting files number is depending on the size of the core. I
have a core takes 10GB disk
space, and it has 4 million documents.

It perhaps also depends on other sole/lucene configurations? let me know if
I should give you any data
with our solr config.  

Here is the actual data from the test I run lately for your reference, you
can see it definitely finished
each partial optimize and the time spent is also included (please note I am
using a core id there which
is different from yours):

/tmp # ls /xxx/solr/data/32455077/index | wc   ---> this is the
start point, 150 seg files
 150  150 946
/tmp # time curl
'http://localhost:8080/solr/32455077/update?optimize=true&maxSegments=10&waitFlush=true'
real0m36.050s
user0m0.002s
sys0m0.003s

/tmp # ls /xxx/solr/data/32455077/index | wc-> after first
partial optimize (10), reduce to 82
 82  82 746
/tmp # time curl
'http://localhost:8080/solr/32455077/update?optimize=true&maxSegments=9&waitFlush=true'
real1m54.364s
user0m0.003s
sys0m0.002s

/tmp # ls /xxx/solr/data/32455077/index | wc
 74  74 674
/tmp # time curl
'http://localhost:8080/solr/32455077/update?optimize=true&maxSegments=8&waitFlush=true'
real2m0.443s
user0m0.002s
sys0m0.003s

/tmp # ls /xxx/solr/data/32455077/index | wc
 66  66 602
/tmp # time curl
'http://localhost:8080/solr/32455077/update?optimize=true&maxSegments=7&waitFlush=true'

real3m22.201s
user0m0.002s
sys0m0ls 

/tmp # ls /xxx/solr/data/32455077/index | wc
 58  58 530
/tmp #  time curl
'http://localhost:8080/solr/32455077/update?optimize=true&maxSegments=6&w 
real3m29.277s
user0m0.001s
sys0m0.004s

/tmp # ls /xxx/solr/data/32455077/index | wc
 50  50 458
/tmp # time curl
'http://localhost:8080/solr/32455077/update?optimize=true&maxSegments=5&w 
real3m41.514s
user0m0.003s
sys0m0.003s

/tmp # ls /xxx/solr/data/32455077/index | wc
 42  42 386
/tmp # time curl
'http://localhost:8080/solr/32455077/update?optimize=true&maxSegments=4&w 
real5m35.697s
user0m0.003s
sys0m0.004s

/tmp # ls /xxx/solr/data/32455077/index | wc
 34  34 314
/tmp # time curl
'http://localhost:8080/solr/32455077/update?optimize=true&maxSegments=3wa 
real7m8.773s
user0m0.003s
sys0m0.002s

/tmp # ls /xxx/solr/data/32455077/index | wc 
 26  26 242
/tmp # time curl
'http://localhost:8080/solr/32455077/update?optimize=true&maxSegments=2&w 
real9m18.814s
user0m0.004s
sys0m0.001s

/tmp # ls /xxx/solr/data/32455077/index | wc
 18  18 170
/tmp # time curl
'http://localhost:8080/solr/32455077/update?optimize=true&maxSegments=1&w
(full optimize)
real16m6.599s
user0m0.003s
sys0m0.004s

Disk Space Usage:
first 3 runs took about 20% extra 
middle couple runs took about 50% extra 
last full optimize took 100% extra


--
View this message in context: 
http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2812415.html
Sent from the Solr - User mailing list archive at Nabble.com.


partial optimize does not reduce the segment number to maxNumSegments

2011-03-15 Thread Renee Sun
I have a core with 120+ segment files and I tried partial optimize specify
maxNumSegments=10, after the optimize the segment files reduced to 64 files;

I did the same optimize again, it reduced to 30 something;

this keeps going and eventually it drops to teen number.

I was expecting seeing the optimize results in exactly 10 segment files or
somewhere near, and why do I have to manually repeat the optimize to reach
that number?

thanks
Renee 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2682195.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgrade to Solr 1.4, very slow at start up when loading all cores

2010-10-13 Thread Renee Sun

just update on this issue...

we turned off the new/first searchers (upgrade to Solr 1.4.1), and ran
benchmark tests, there is no noticeable performance impact on the queries we
perform comparing with Solr 1.3 benchmark tests WITH new/first searchers.

Also the memory usage reduced by 5.5 GB after loading the cores with our
data volume by turning these static warm caches off.

We will take this approach in our production environment but meanwhile I am
curious if this issue will be addressed: it seems the new/first searchers do
not really buy any performance benefits because it uses so much memory,
especially at core loading time.

thanks
Renee

 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-to-Solr-1-4-very-slow-at-start-up-when-loading-all-cores-tp1608728p1697609.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: using HTTPClient sending solr ping request wont timeout as specified

2010-10-13 Thread Renee Sun

Ken, 
looks like we posted at same time :-)
thanks very much!
Renee
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/using-HTTPClient-sending-solr-ping-request-wont-timeout-as-specified-tp1691292p1695584.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: using HTTPClient sending solr ping request wont timeout as specified

2010-10-13 Thread Renee Sun

thanks Michael, I got it resolved last night... you are right, it is more
like a HttpClient issue after I tried another link unrelated to solr. If
anyone is interested, here is the working code:


HttpClientParams httpClientParams = new HttpClientParams();
httpClientParams.setSoTimeout(timeout);
// set connection parameters
HttpConnectionManagerParams httpConnectionMgrParams = new
HttpConnectionManagerParams();
httpConnectionMgrParams.setConnectionTimeout(timeout); // connection
   // timeout
HttpConnectionManager httpConnectionMgr = new
SimpleHttpConnectionManager();
httpConnectionMgr.setParams(httpConnectionMgrParams);
// create httpclient
HttpClient client = new HttpClient(httpClientParams,
httpConnectionMgr);

HttpMethod method = new GetMethod(solrReq);

thanks
Renee
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/using-HTTPClient-sending-solr-ping-request-wont-timeout-as-specified-tp1691292p1695551.html
Sent from the Solr - User mailing list archive at Nabble.com.


using HTTPClient sending solr ping request wont timeout as specified

2010-10-12 Thread Renee Sun

I am using the following code to send out solr request from a webapp. please
notice the timeout setting:


HttpClient client = new HttpClient();
HttpMethod method = new GetMethod(solrReq);


method.getParams().setParameter(HttpConnectionParams.SO_TIMEOUT,
new Integer(15000));


client.executeMethod(method);
int statcode = method.getStatusCode();

if (statcode == HttpStatus.SC_OK)
{
... ...

when the 'solrReq' is a solr query url (such as
http://[host]:8080/solr/blah/select?q=x), if the server is not
responsive it times out in 15 seconds;

however, if the 'solrReq' is a solr ping (such as
http://[host]:8080/solr/default/admin/ping), it wont timeout in 15 second,
it seems it will time out in few minutest instead.

any idea, suggestions?
thanks
Renee
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/using-HTTPClient-sending-solr-ping-request-wont-timeout-as-specified-tp1691292p1691292.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: using HTTPClient sending solr ping request wont timeout as specified

2010-10-12 Thread Renee Sun

I also added the following timeout for the connection, still not working:


client.getParams().setSoTimeout(httpClientPingTimeout);
   
client.getParams().setConnectionManagerTimeout(httpClientPingTimeout);

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/using-HTTPClient-sending-solr-ping-request-wont-timeout-as-specified-tp1691292p1691355.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgrade to Solr 1.4, very slow at start up when loading all cores

2010-10-05 Thread Renee Sun

Hi Yonik,
I tried the fix suggested in your comments (using "solr.TrieDateField" ),
and it loaded up 130 cores in 1 minute, 1.3GB memory (a little more than 1GB
when turning off static warm cache, and much less than 6.5GB when use
'solr.DateField').

Will this have any impact on first query or performance?
I am about to run some benchmark test and compare with old data, will update
you.
Renee 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-to-Solr-1-4-very-slow-at-start-up-when-loading-all-cores-tp1608728p1637176.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgrade to Solr 1.4, very slow at start up when loading all cores

2010-10-01 Thread Renee Sun

http://lucene.472066.n3.nabble.com/file/n1617135/solrconfig.xml
solrconfig.xml 

Hi Yonik,
I have uploaded our solrconfig.xml file for your reference.

we also tried 1.4.1, for same index data, it took about 30-55 minutes to
load up all 130 cores, it did not help at all.

There is no query running when we tried to upload the cores.

Since JConsole is not responding at all when this happens, I am not sure if
there is any command link memory profiler I can use to collect information,
any suggestions?
thanks
Renee

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-to-Solr-1-4-very-slow-at-start-up-when-loading-all-cores-tp1608728p1617135.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgrade to Solr 1.4, very slow at start up when loading all cores

2010-10-01 Thread Renee Sun

Hi Yonik,

I attached the solrconfig.xml to you in previous post, and we do have
firstSearch and newSearch hook ups.

I commented them out, all 130 cores loaded up in 1 minute, same as in solr
1.3.  total memory took about 1GB. Whereas in 1.3, with hook ups, it took
about 6.5GB for same amount of data.

I assuem the consequence of commenting out the static warm requests will be
it will slow down first time we hit the core for query?

thanks
Renee
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-to-Solr-1-4-very-slow-at-start-up-when-loading-all-cores-tp1608728p1617263.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgrade to Solr 1.4, very slow at start up when loading all cores

2010-09-30 Thread Renee Sun

Hi Yonik,
thanks for your reply.

I entered a bug for this at :
https://issues.apache.org/jira/browse/SOLR-2138

to answer your questions here:
  - do you have any warming queries configured? 
> no, all autowarmingcount are set to 0 for all caches
  - do the cores have documents already, and if so, how many per core? 
> yes, 130 cores total, 2,3 of them already have 1~2.4 million
documents, others have about 50,000 documents
  - are you using the same schema & solrconfig, or did you upgrade? 
> yes, absolutely no change
  - have you tried finding out what is taking up all the memory (or 
all the CPU time)? 
> yes, JConsole shows after 70 cores are loaded in about 4 minutes, all
16GB memory are taken and rest of cores load extremely slow. The memory
remain high and never dropped.

We are in process to upgrade to 1.4.1

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-to-Solr-1-4-very-slow-at-start-up-when-loading-all-cores-tp1608728p1611030.html
Sent from the Solr - User mailing list archive at Nabble.com.


Upgrade to Solr 1.4, very slow at start up when loading all cores

2010-09-30 Thread Renee Sun

Hi -
I posted this problem but no response, I guess I need to post this in the
Solr-User forum. Hopefully you will help me on this.

We were running Solr 1.3 for long time, with 130 cores. Just upgrade to Solr
1.4, then when we start the Solr, it took about 45 minutes. The catalina.log
shows Solr is very slowly loading all the cores.

We did optimize, did not help at all.

I run JConsole to monitor the memory. I notice the first 70 cores were
loaded pretty fast, like in 3,4 minutes.

But after that, the memory went all way up to about 15GB (we allocated 16GB
to solr), and it slows down
right there, slower and slower. We use concurrent GC. JConsole shows only
ParNew GCs kicked off, but it doesnt bring down the memory.

With Solr 1.3, all 130 cores loaded in 5,6 minutes. 

Please let me know if there is known memory issue with Solr 1.4. Or is there
something (configuration)
we need to tweak to make it work efficiently in 1.4?

thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Upgrade-to-Solr-1-4-very-slow-at-start-up-when-loading-all-cores-tp1608728p1608728.html
Sent from the Solr - User mailing list archive at Nabble.com.