Cache fails to warm after Replication Recovery in solr cloud

2019-12-24 Thread Cao, Li
Hi!

I have some custom cache set up in solrconfig XML for a solr cloud cluster in 
Kubernetes. Each node has Kubernetes persistence set up. After I execute a 
“delete pod” command to restart a node it goes into Replication Recovery 
successfully but my custom cache’s warm() method never gets called. Is this 
expected behavior? The events I observed are:

  1.  Cache init() method called
  2.  Searcher created and registered
  3.  Replication recovery


Thanks!

Li


Re: [EXTERNAL] Autoscaling simulation error

2019-12-23 Thread Cao, Li
Thank you for creating the JIRA! Will follow

On 12/19/19, 11:09 AM, "Andrzej Białecki"  wrote:

Hi,

Thanks for the data. I see the problem now - it’s a bug in the simulator. I 
filed a Jira issue to track and fix it: SOLR-14122.

> On 16 Dec 2019, at 19:13, Cao, Li  wrote:
>
>> I am using solr 8.3.0 in cloud mode. I have collection level autoscaling 
policy and the collection name is “entity”. But when I run autoscaling 
simulation all the steps failed with this message:
>>
>>   "error":{
>> "exception":"java.io.IOException: 
java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: 
org.apache.solr.common.SolrException: Could not find collection : 
entity/shards",
>> "suggestion":{
>>   "type":"repair",
>>   "operation":{
>> "method":"POST",
>> "path":"/c/entity/shards",
>> "command":{"add-replica":{
>> "shard":"shard2",
>> "node":"my_node:8983_solr",
>> "type":"TLOG",
>> "replicaInfo":null}}},





Re: [EXTERNAL] Re: "No value present" when set cluster policy for autoscaling in solr cloud mode

2019-12-23 Thread Cao, Li
Thank you, Andrzej! I am going to try IN operand as a work around.

On 12/19/19, 10:17 AM, "Andrzej Białecki"  wrote:

Hi,

For some strange reason global tags (such as “cores”) don’t support the 
“nodeset” syntax. For “cores” the only supported attribute is “node”, and then 
you’re only allowed to use #ANY or a single specific node name (with optional 
“!" NOT operand), or a JSON array containing node names to indicate the IN 
operand.

The Ref Guide indeed is not very clear on that…


> On 17 Dec 2019, at 21:20, Cao, Li  wrote:
>
> Hi!
>
> I am trying to add a cluster policy to a freshly built 8.3.0 cluster (no 
collection added). I got this error when adding such a cluster policy
>
> { 
"set-cluster-policy":[{"cores":"<3","nodeset":{"sysprop.rex.node.type":"tlog"}}]}
>
> Basically I want to limit the number of cores for certain machines with a 
special environmental variable value.
>
> But I got this error response:
>
> {
>  "responseHeader":{
>"status":400,
>"QTime":144},
>  "result":"failure",
>  "WARNING":"This response format is experimental.  It is likely to change 
in the future.",
>  "error":{
>"metadata":[
>  "error-class","org.apache.solr.api.ApiBag$ExceptionWithErrObject",
>  
"root-error-class","org.apache.solr.api.ApiBag$ExceptionWithErrObject"],
>"details":[{
>"set-cluster-policy":[{
>"cores":"<3",
>"nodeset":{"sysprop.rex.node.type":"tlog"}}],
>    "errorMessages":["No value present"]}],
>"msg":"Error in command payload",
>"code":400}}
>
> However, this works:
>
> { "set-cluster-policy":[{"cores":"<3","node":"#ANY"}]}
>
> I read the autoscaling policy documentations and cannot figure out why. 
Could someone help me on this?
>
> Thanks!
>
> Li




"No value present" when set cluster policy for autoscaling in solr cloud mode

2019-12-17 Thread Cao, Li
Hi!

I am trying to add a cluster policy to a freshly built 8.3.0 cluster (no 
collection added). I got this error when adding such a cluster policy

{ 
"set-cluster-policy":[{"cores":"<3","nodeset":{"sysprop.rex.node.type":"tlog"}}]}

Basically I want to limit the number of cores for certain machines with a 
special environmental variable value.

But I got this error response:

{
  "responseHeader":{
"status":400,
"QTime":144},
  "result":"failure",
  "WARNING":"This response format is experimental.  It is likely to change in 
the future.",
  "error":{
"metadata":[
  "error-class","org.apache.solr.api.ApiBag$ExceptionWithErrObject",
  "root-error-class","org.apache.solr.api.ApiBag$ExceptionWithErrObject"],
"details":[{
"set-cluster-policy":[{
"cores":"<3",
"nodeset":{"sysprop.rex.node.type":"tlog"}}],
"errorMessages":["No value present"]}],
"msg":"Error in command payload",
"code":400}}

However, this works:

{ "set-cluster-policy":[{"cores":"<3","node":"#ANY"}]}

I read the autoscaling policy documentations and cannot figure out why. Could 
someone help me on this?

Thanks!

Li


Re: [EXTERNAL] Re: Autoscaling simulation error

2019-12-16 Thread Cao, Li
Hi Andrzej ,

I have put the JSONs produced by "save" commands below:

autoscalingState.json - https://pastebin.com/CrR0TdLf
clusterState.json - https://pastebin.com/zxuYAMux
nodeState.json https://pastebin.com/hxqjVUfV
statistics.json https://pastebin.com/Jkaw8Y3j

The simulate command is:
/opt/solr-8.3.0/bin/solr autoscaling -a policy2.json -simulate  -zkHost 
rexcloud-swoods-zookeeper-headless:2181

Policy2 can be found here:
https://pastebin.com/VriJ27DE

Setup:
12 nodes on Kubernetes. 6 for TLOG and 6 for Pull. The simulation is run on one 
of nodes inside Kubernetes because it needs the zookeeper inside the Kubernetes.

Thanks!

Li


On 12/15/19, 5:13 PM, "Andrzej Białecki"  wrote:

Could you please provide the exact command-line? It would also help if you 
could provide an autoscaling snapshot of the cluster (bin/solr autoscaling 
-save ) or at least the autoscaling diagnostic info.

(Please note that the mailing list removes all attachments, so just provide 
a link to the snapshot).


> On 15 Dec 2019, at 18:42, Cao, Li  wrote:
>
> Hi!
>
> I am using solr 8.3.0 in cloud mode. I have collection level autoscaling 
policy and the collection name is “entity”. But when I run autoscaling 
simulation all the steps failed with this message:
>
>"error":{
>  "exception":"java.io.IOException: 
java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: 
org.apache.solr.common.SolrException: Could not find collection : 
entity/shards",
>  "suggestion":{
>"type":"repair",
>"operation":{
>  "method":"POST",
>  "path":"/c/entity/shards",
>  "command":{"add-replica":{
>  "shard":"shard2",
>      "node":"my_node:8983_solr",
>  "type":"TLOG",
>  "replicaInfo":null}}},
>
> Does anyone know how to fix this? Is this a bug?
>
> Thanks!
>
> Li




Autoscaling simulation error

2019-12-15 Thread Cao, Li
Hi!

I am using solr 8.3.0 in cloud mode. I have collection level autoscaling policy 
and the collection name is “entity”. But when I run autoscaling simulation all 
the steps failed with this message:

"error":{
  "exception":"java.io.IOException: 
java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: 
org.apache.solr.common.SolrException: Could not find collection : 
entity/shards",
  "suggestion":{
"type":"repair",
"operation":{
  "method":"POST",
  "path":"/c/entity/shards",
  "command":{"add-replica":{
  "shard":"shard2",
  "node":"my_node:8983_solr",
  "type":"TLOG",
  "replicaInfo":null}}},

Does anyone know how to fix this? Is this a bug?

Thanks!

Li


Re: what's in cursorMark

2018-10-01 Thread Li, Yi
Hi,

Did you just do base84 decoding?

Thanks,
Yi

On 10/1/18, 9:41 AM, "Vincenzo D'Amore"  wrote:

Hi Yi,

have you tried to decode the string?

AoE/E2Zhdm9yaXRlUGxhY2UvZjg1MzMzYzEtYzQ0NC00Y2ZiLWFmZDctMzcyODFhMDdiMGY3

seems to be only:

? favoritePlace/f85333c1-c444-4cfb-afd7-37281a07b0f7



On Mon, Oct 1, 2018 at 3:37 PM Li, Yi  wrote:

> Hi,
>
> cursorMark appears as something like
> AoE/E2Zhdm9yaXRlUGxhY2UvZjg1MzMzYzEtYzQ0NC00Y2ZiLWFmZDctMzcyODFhMDdiMGY3
>
> and the document says it is “Base64 encoded serialized representation of
> the sort values encapsulated by this object”
>
> I like to know if I can decode and what content I will see in there.
>
> For example, If there is an object as a json:
> {
> “id”:”123”,
> “name”:”objectname”,
> “secret”:”my secret”
> }
> if I search id:123, and only that object returned with a cursorMark, will
> I be able to decode the cursorMark and get that secret?
>
> Thanks,
> Yi
>


-- 
Vincenzo D'Amore




what's in cursorMark

2018-10-01 Thread Li, Yi
Hi,

cursorMark appears as something like 
AoE/E2Zhdm9yaXRlUGxhY2UvZjg1MzMzYzEtYzQ0NC00Y2ZiLWFmZDctMzcyODFhMDdiMGY3

and the document says it is “Base64 encoded serialized representation of the 
sort values encapsulated by this object”

I like to know if I can decode and what content I will see in there.

For example, If there is an object as a json:
{
“id”:”123”,
“name”:”objectname”,
“secret”:”my secret”
}
if I search id:123, and only that object returned with a cursorMark, will I be 
able to decode the cursorMark and get that secret?

Thanks,
Yi


Running Solr 5.3.1 with JDK10

2018-06-19 Thread Li, Yi
Hi,

Currently we are running Solr 5.3.1 with JDK8 and we are trying to run Solr 
5.3.1 with JDK10. Initially we got a few errors complaining some JVM options 
are removed since JDK9. We removed those options in solr.in.sh:
UseConcMarkSweepGC
UseParNewGC
PrintHeapAtGC
PrintGCDateStamps
PrintGCTimeStamps
PrintTenuringDistribution
PrintGCApplicationStoppedTime

And the options left in solr.in.sh:

  1.  Enable verbose GC logging
GC_LOG_OPTS="-verbose:gc -XX:+PrintGCDetails"

  1.  These GC settings have shown to work well for a number of common Solr 
workloads
GC_TUNE="-XX:NewRatio=3 \
-XX:SurvivorRatio=4 \
-XX:TargetSurvivorRatio=90 \
-XX:MaxTenuringThreshold=8 \
-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \
-XX:+CMSScavengeBeforeRemark \
-XX:PretenureSizeThreshold=64m \
-XX:+UseCMSInitiatingOccupancyOnly \
-XX:CMSInitiatingOccupancyFraction=50 \
-XX:CMSMaxAbortablePrecleanTime=6000 \
-XX:+CMSParallelRemarkEnabled \
-XX:+ParallelRefProcEnabled"

After that SOLR runs but it got an error on SystemInfoHandler Error getting JMX 
properties.
[root@centos6 logs]# service solr status
Found 1 Solr nodes:
Solr process 4630 running on port 8983
ERROR: Failed to get system information from http://localhost:8983/solr due to: 
java.lang.NullPointerException

Can someone share experience using Solr 5.3.x with JDK9 or above?

Thanks,
Yi

P.S.
Console output:
[0.001s][warning][gc] -Xloggc is deprecated. Will use 
-Xlog:gc:/solr/logs/solr_gc.log instead.
[0.001s][warning][gc] -XX:+PrintGCDetails is deprecated. Will use -Xlog:gc* 
instead.
[0.003s][info ][gc] Using Serial
WARNING: System properties and/or JVM args set. Consider using --dry-run or 
--exec
0 INFO (main) [ ] o.e.j.u.log Logging initialized @532ms
205 INFO (main) [ ] o.e.j.s.Server jetty-9.2.11.v20150529
218 WARN (main) [ ] o.e.j.s.h.RequestLogHandler !RequestLog
220 INFO (main) [ ] o.e.j.d.p.ScanningAppProvider Deployment monitor 
file:/home/solr/solr-5.3.1/server/contexts/ at interval 0
559 INFO (main) [ ] o.e.j.w.StandardDescriptorProcessor NO JSP Support for 
/solr, did not find org.apache.jasper.servlet.JspServlet
569 WARN (main) [ ] o.e.j.s.SecurityHandler 
ServletContext@o.e.j.w.WebAppContext@1a75e76a
{/solr,file:/home/solr/solr-5.3.1/server/solr-webapp/webapp/,STARTING} 
{/home/solr/solr-5.3.1/server/solr-webapp/webapp} has uncovered http methods 
for path: /
577 INFO (main) [ ] o.a.s.s.SolrDispatchFilter SolrDispatchFilter.init(): 
WebAppClassLoader=1904783235@7188af83
625 INFO (main) [ ] o.a.s.c.SolrResourceLoader JNDI not configured for solr 
(NoInitialContextEx)
626 INFO (main) [ ] o.a.s.c.SolrResourceLoader using system property 
solr.solr.home: /solr/data
627 INFO (main) [ ] o.a.s.c.SolrResourceLoader new SolrResourceLoader for 
directory: '/solr/data/'
750 INFO (main) [ ] o.a.s.c.SolrXmlConfig Loading container configuration from 
/solr/data/solr.xml
817 INFO (main) [ ] o.a.s.c.CoresLocator Config-defined core root directory: 
/solr/data
[1.402s][info ][gc] GC(0) Pause Full (Metadata GC Threshold) 85M->7M(490M) 
37.281ms
875 INFO (main) [ ] o.a.s.c.CoreContainer New CoreContainer 1193398802
875 INFO (main) [ ] o.a.s.c.CoreContainer Loading cores into CoreContainer 
[instanceDir=/solr/data/]
875 INFO (main) [ ] o.a.s.c.CoreContainer loading shared library: /solr/data/lib
875 WARN (main) [ ] o.a.s.c.SolrResourceLoader Can't find (or read) directory 
to add to classloader: lib (resolved as: /solr/data/lib).
889 INFO (main) [ ] o.a.s.h.c.HttpShardHandlerFactory created with 
socketTimeout : 60,connTimeout : 6,maxConnectionsPerHost : 
20,maxConnections : 1,corePoolSize : 0,maximumPoolSize : 
2147483647,maxThreadIdleTime : 5,sizeOfQueue : -1,fairnessPolicy : 
false,useRetries : false,
1036 INFO (main) [ ] o.a.s.u.UpdateShardHandler Creating UpdateShardHandler 
HTTP client with params: socketTimeout=60=6=true
1038 INFO (main) [ ] o.a.s.l.LogWatcher SLF4J impl is 
org.slf4j.impl.Log4jLoggerFactory
1039 INFO (main) [ ] o.a.s.l.LogWatcher Registering Log Listener [Log4j 
(org.slf4j.impl.Log4jLoggerFactory)]
1040 INFO (main) [ ] o.a.s.c.CoreContainer Security conf doesn't exist. 
Skipping setup for authorization module.
1041 INFO (main) [ ] o.a.s.c.CoreContainer No authentication plugin used.
1179 INFO (main) [ ] o.a.s.c.CoresLocator Looking for core definitions 
underneath /solr/data
1180 INFO (main) [ ] o.a.s.c.CoresLocator Found 0 core definitions
1185 INFO (main) [ ] o.a.s.s.SolrDispatchFilter 
user.dir=/home/solr/solr-5.3.1/server
1186 INFO (main) [ ] o.a.s.s.SolrDispatchFilter SolrDispatchFilter.init() done
1216 INFO (main) [ ] o.e.j.s.h.ContextHandler Started 
o.e.j.w.WebAppContext@1a75e76a{/solr,file:/home/solr/solr-5.3.1/server/solr-webapp/webapp/,AVAILABLE}{/home/solr/solr-5.3.1/server/solr-webapp/webapp}
1224 INFO (main) [ ] o.e.j.s.ServerConnector Started ServerConnector@2102a4d5
{HTTP/1.1} {0.0.0.0:8983}
1228 INFO (main) [ ] o.e.j.s.Server Started @1762ms
14426 WARN (qtp1045997582-15) [ ] o.a.s.h.a.SystemInfoHandler Error 

Problem encountered upon starting Solr after improper exit

2018-03-14 Thread YIFAN LI
To whom it may concern,

I am running Solr 7.1.0 and encountered a problem starting Solr after I
killed the Java process running Solr without proper cleanup. The error
message that I received is as following:

solr-7.1.0 liyifan$ bin/solr run


dyld: Library not loaded: /usr/local/opt/mpfr/lib/libmpfr.4.dylib

  Referenced from: /usr/local/bin/awk

  Reason: image not found

Your current version of Java is too old to run this version of Solr

We found version , using command 'java -version', with response:

java version "1.8.0_45"

Java(TM) SE Runtime Environment (build 1.8.0_45-b14)

Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)


Please install latest version of Java 1.8 or set JAVA_HOME properly.


Debug information:

JAVA_HOME: N/A

Active Path:

/Users/liyifan/anaconda3/bin:/Library/Frameworks/Python.framework/Versions/3.5/bin:/opt/local/bin:/opt/local/sbin:/usr/Documents/2016\
Spring/appcivist-mobilization/activator-dist-1.3.9/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/opt/X11/bin:/usr/local/git/bin

After I reset the JAVA_HOME variable, it still gives me the error:

bin/solr start

dyld: Library not loaded: /usr/local/opt/mpfr/lib/libmpfr.4.dylib

  Referenced from: /usr/local/bin/awk

  Reason: image not found

Your current version of Java is too old to run this version of Solr

We found version , using command
'/Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home/bin/java
-version', with response:

java version "1.8.0_45"

Java(TM) SE Runtime Environment (build 1.8.0_45-b14)

Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)


Please install latest version of Java 1.8 or set JAVA_HOME properly.


Debug information:

JAVA_HOME: /Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home

Active Path:

/Users/liyifan/anaconda3/bin:/Library/Frameworks/Python.framework/Versions/3.5/bin:/opt/local/bin:/opt/local/sbin:/usr/Documents/2016\
Spring/appcivist-mobilization/activator-dist-1.3.9/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/opt/X11/bin:/usr/local/git/bin:/Library/Java/JavaVirtualMachines/jdk1.8.0_45.jdk/Contents/Home/bin

and the director /usr/local/opt/mpfr/lib/ only contains the following files:

ls /usr/local/opt/mpfr/lib/

libmpfr.6.dylib libmpfr.a libmpfr.dylib pkgconfig

Do you think this problem is caused by killing the Java process without
proper cleanup? Could you suggest some solution to this problem? Thank you
very much!

Best,
Yifan


Re: Disable leaders in SolrCloud mode

2016-05-16 Thread Li Ding
This happened when the second time I'm performing restart.  But after that,
every time this collection is stuck at here.  If I restart the leader node
as well, the core can get out of the recovering state

On Mon, May 16, 2016 at 5:00 PM, Li Ding <li.d...@bloomreach.com> wrote:

> Hi Anshum,
>
> This is for restart solr with 1000 collections.  I created an environment
> with 1023 collections today All collections are empty.  During repeated
> restart test, one of the cores are marked as "recovering" and stuck there
> for ever.   The solr is 4.6.1 and we have 3 zk hosts and 8 solr hosts, here
> is the relevant logs:
>
> ---This is the logs for the core stuck at "recovering"
>
> INFO  - 2016-05-16 22:47:04.984; org.apache.solr.cloud.ZkController;
> publishing core=test_collection_112_shard1_replica2 state=down
>
> INFO  - 2016-05-16 22:47:05.999; org.apache.solr.core.SolrCore;
> [test_collection_112_shard1_replica2]  CLOSING SolrCore
> org.apache.solr.core.SolrCore@1e48619
>
> INFO  - 2016-05-16 22:47:06.001; org.apache.solr.core.SolrCore;
> [test_collection_112_shard1_replica2] Closing main searcher on request.
>
> INFO  - 2016-05-16 22:47:06.001;
> org.apache.solr.core.CachingDirectoryFactory; looking to close /mnt
> /solrcloud_latest/solr/test_collection_112_shard1_replica2/data/index
> [CachedDir<

Re: Disable leaders in SolrCloud mode

2016-05-16 Thread Li Ding
Hi Anshum,

This is for restart solr with 1000 collections.  I created an environment
with 1023 collections today All collections are empty.  During repeated
restart test, one of the cores are marked as "recovering" and stuck there
for ever.   The solr is 4.6.1 and we have 3 zk hosts and 8 solr hosts, here
is the relevant logs:

---This is the logs for the core stuck at "recovering"

INFO  - 2016-05-16 22:47:04.984; org.apache.solr.cloud.ZkController;
publishing core=test_collection_112_shard1_replica2 state=down

INFO  - 2016-05-16 22:47:05.999; org.apache.solr.core.SolrCore;
[test_collection_112_shard1_replica2]  CLOSING SolrCore
org.apache.solr.core.SolrCore@1e48619

INFO  - 2016-05-16 22:47:06.001; org.apache.solr.core.SolrCore;
[test_collection_112_shard1_replica2] Closing main searcher on request.

INFO  - 2016-05-16 22:47:06.001;
org.apache.solr.core.CachingDirectoryFactory; looking to close /mnt
/solrcloud_latest/solr/test_collection_112_shard1_replica2/data/index

Disable leaders in SolrCloud mode

2016-05-16 Thread Li Ding
Hi all,

We have an unique scenario where we don't need leaders in every collection
to recover from failures.  The indexing never changes.  But we have faced
problems where either zk marked a core as down while the core is fine in
non-distributed query or during restart, the core never comes up.  My
question is that is there any simple way to disable those leaders and
leaders election in SolrCloud,  We do use multi-shard and distributed
queries.  But with our unique situation, we don't need leaders to maintain
the correct status of the index.  So if we can get rid of that part, our
solr restart will be more robust.

Any suggestions will be appreciated.

Thanks,

Li


Re: Questions on SolrCloud core state, when will Solr recover a "DOWN" core to "ACTIVE" core.

2016-04-27 Thread Li Ding
Hi Erick,

I don't have the GC log.  But after the GC finished.  Isn't zk ping
succeeds and the core should be back to normal state?  From the log I
posted.  The sequence is:

1) Solr Detects itself can't connect to ZK and reconnect to ZK
2) Solr marked all cores are down
3) Solr recovery each cores, some succeeds, some failed.
4) After 30 minutes, the cores that are failed still marked as down.

So my questions is, during the 30 minutes interval, if GC takes too long,
all cores should failed.  And GC doesn't take longer than a minute since
all serving requests to other calls succeeds and the next zk ping should
bring the core back to normal? right?  We have an active monitor running at
the same time querying every core in distrib=false mode and every query
succeeds.

Thanks,

Li

On Tue, Apr 26, 2016 at 6:20 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> One of the reasons this happens is if you have very
> long GC cycles, longer than the Zookeeper "keep alive"
> timeout. During a full GC pause, Solr is unresponsive and
> if the ZK ping times out, ZK assumes the machine is
> gone and you get into this recovery state.
>
> So I'd collect GC logs and see if you have any
> stop-the-world GC pauses that take longer than the ZK
> timeout.
>
> see Mark Millers primer on GC here:
> https://lucidworks.com/blog/2011/03/27/garbage-collection-bootcamp-1-0/
>
> Best,
> Erick
>
> On Tue, Apr 26, 2016 at 2:13 PM, Li Ding <li.d...@bloomreach.com> wrote:
> > Thank you all for your help!
> >
> > The zookeeper log rolled over, thisis from Solr.log:
> >
> > Looks like the solr and zk connection is gone for some reason
> >
> > INFO  - 2016-04-21 12:37:57.536;
> > org.apache.solr.common.cloud.ConnectionManager; Watcher
> > org.apache.solr.common.cloud.ConnectionManager@19789a96
> > name:ZooKeeperConnection Watcher:{ZK HOSTS here} got event WatchedEvent
> > state:Disconnected type:None path:null path:null type:None
> >
> > INFO  - 2016-04-21 12:37:57.536;
> > org.apache.solr.common.cloud.ConnectionManager; zkClient has disconnected
> >
> > INFO  - 2016-04-21 12:38:24.248;
> > org.apache.solr.common.cloud.DefaultConnectionStrategy; Connection
> expired
> > - starting a new one...
> >
> > INFO  - 2016-04-21 12:38:24.262;
> > org.apache.solr.common.cloud.ConnectionManager; Waiting for client to
> > connect to ZooKeeper
> >
> > INFO  - 2016-04-21 12:38:24.269;
> > org.apache.solr.common.cloud.ConnectionManager; Connected:true
> >
> >
> > Then it publishes all cores on the hosts are down.  I just list three
> cores
> > here:
> >
> > INFO  - 2016-04-21 12:38:24.269; org.apache.solr.cloud.ZkController;
> > publishing core=product1_shard1_replica1 state=down
> >
> > INFO  - 2016-04-21 12:38:24.271; org.apache.solr.cloud.ZkController;
> > publishing core=collection1 state=down
> >
> > INFO  - 2016-04-21 12:38:24.272; org.apache.solr.cloud.ZkController;
> > numShards not found on descriptor - reading it from system property
> >
> > INFO  - 2016-04-21 12:38:24.289; org.apache.solr.cloud.ZkController;
> > publishing core=product2_shard5_replica1 state=down
> >
> > INFO  - 2016-04-21 12:38:24.292; org.apache.solr.cloud.ZkController;
> > publishing core=product2_shard13_replica1 state=down
> >
> >
> > product1 has only one shard one replica and it's able to be active
> > successfully:
> >
> > INFO  - 2016-04-21 12:38:26.383; org.apache.solr.cloud.ZkController;
> > Register replica - core:product1_shard1_replica1 address:http://
> > {internalIp}:8983/solr collection:product1 shard:shard1
> >
> > WARN  - 2016-04-21 12:38:26.385; org.apache.solr.cloud.ElectionContext;
> > cancelElection did not find election node to remove
> >
> > INFO  - 2016-04-21 12:38:26.393;
> > org.apache.solr.cloud.ShardLeaderElectionContext; Running the leader
> > process for shard shard1
> >
> > INFO  - 2016-04-21 12:38:26.399;
> > org.apache.solr.cloud.ShardLeaderElectionContext; Enough replicas found
> to
> > continue.
> >
> > INFO  - 2016-04-21 12:38:26.399;
> > org.apache.solr.cloud.ShardLeaderElectionContext; I may be the new
> leader -
> > try and sync
> >
> > INFO  - 2016-04-21 12:38:26.399; org.apache.solr.cloud.SyncStrategy; Sync
> > replicas to http://{internalIp}:8983/solr/product1_shard1_replica1/
> >
> > INFO  - 2016-04-21 12:38:26.399; org.apache.solr.cloud.SyncStrategy; Sync
> > Success - now sync replicas to me
> >
> > INFO  - 2016-04-21 12:38:26.399; org.apache.solr.cloud.SyncStrategy;
> > http://{internalIp}:898

Re: Questions on SolrCloud core state, when will Solr recover a "DOWN" core to "ACTIVE" core.

2016-04-26 Thread Li Ding
;
org.apache.solr.cloud.ShardLeaderElectionContext; I am the new leader:
http://{internalIp}:8983/solr/product2_shard5_replica1_shard5_replica1/
shard5

INFO  - 2016-04-21 12:38:26.632; org.apache.solr.common.cloud.SolrZkClient;
makePath: /collections/product2_shard5_replica1/leaders/shard5

INFO  - 2016-04-21 12:38:26.645; org.apache.solr.cloud.ZkController; We are
http://{internalIp}:8983/solr/product2_shard5_replica1_shard5_replica1/ and
leader is http://{internalIp}:8983/solr
product2_shard5_replica1_shard5_replica1/

INFO  - 2016-04-21 12:38:26.646;
org.apache.solr.common.cloud.ZkStateReader; Updating cloud state from
ZooKeeper...


Before I restarted this server, a bunch of queries failed for this
collection product2.  But I don't think it will affect the core status.


Do you guys have any idea about why this particular core is not published
as active since from the log, most steps are done except the very last one
to publish info to ZK.


Thanks,


Li
On Thu, Apr 21, 2016 at 7:08 AM, Rajesh Hazari <rajeshhaz...@gmail.com>
wrote:

> Hi Li,
>
> Do you see timeouts liek "CLUSTERSTATUS the collection time out:180s"
> if its the case, this may be related to
> https://issues.apache.org/jira/browse/SOLR-7940,
> and i would say either use the patch file or upgrade.
>
>
> *Thanks,*
> *Rajesh,*
> *8328789519,*
> *If I don't answer your call please leave a voicemail with your contact
> info, *
> *will return your call ASAP.*
>
> On Thu, Apr 21, 2016 at 6:02 AM, YouPeng Yang <yypvsxf19870...@gmail.com>
> wrote:
>
> > Hi
> >We have used Solr4.6 for 2 years,If you post more logs ,maybe we can
> > fixed it.
> >
> > 2016-04-21 6:50 GMT+08:00 Li Ding <li.d...@bloomreach.com>:
> >
> > > Hi All,
> > >
> > > We are using SolrCloud 4.6.1.  We have observed following behaviors
> > > recently.  A Solr node in a Solrcloud cluster is up but some of the
> cores
> > > on the nodes are marked as down in Zookeeper.  If the cores are parts
> of
> > a
> > > multi-sharded collection with one replica,  the queries to that
> > collection
> > > will fail.  However, when this happened, if we issue queries to the
> core
> > > directly, it returns 200 and correct info.  But once Solr got into the
> > > state, the core will be marked down forever unless we do a restart on
> > Solr.
> > >
> > > Has anyone seen this behavior before?  Is there any to get out of the
> > state
> > > on its own?
> > >
> > > Thanks,
> > >
> > > Li
> > >
> >
>


Questions on SolrCloud core state, when will Solr recover a "DOWN" core to "ACTIVE" core.

2016-04-20 Thread Li Ding
Hi All,

We are using SolrCloud 4.6.1.  We have observed following behaviors
recently.  A Solr node in a Solrcloud cluster is up but some of the cores
on the nodes are marked as down in Zookeeper.  If the cores are parts of a
multi-sharded collection with one replica,  the queries to that collection
will fail.  However, when this happened, if we issue queries to the core
directly, it returns 200 and correct info.  But once Solr got into the
state, the core will be marked down forever unless we do a restart on Solr.

Has anyone seen this behavior before?  Is there any to get out of the state
on its own?

Thanks,

Li


Re: Why are these two queries different?

2015-05-12 Thread Frank li
Thanks for your help. I figured it out. Just as you said. Appreciate your
help. Somehow forgot to reply your post.

On Wed, Apr 29, 2015 at 9:24 AM, Chris Hostetter hossman_luc...@fucit.org
wrote:


 : We did two SOLR qeries and they supposed to return the same results but
 : did not:

 the short answer is: if you want those queries to return the same results,
 then you need to adjust your query time analyzer forthe all_text field to
 not split intra numberic tokens on ,

 i don't know *why* exactly it's doing that, because you didn't give us the
 full details of your field/fieldtypes (or other really important info: the
 full request params -- echoParams=all -- and the documents matched by your
 second query, etc... https://wiki.apache.org/solr/UsingMailingLists )
 ... but that's the reason the queries are different as evident from the
 parsedquery output.


 : Query 1: all_text:(US 4,568,649 A)
 :
 : parsedquery: (+((all_text:us ((all_text:4 all_text:568 all_text:649
 : all_text:4568649)~4))~2))/no_coord,
 :
 : Result: numFound: 0,
 :
 : Query 2: all_text:(US 4568649)
 :
 : parsedquery: (+((all_text:us all_text:4568649)~2))/no_coord,
 :
 :
 : Result: numFound: 2,
 :
 :
 : We assumed the two return the same result. Our default operator is AND.



 -Hoss
 http://www.lucidworks.com/



Re: JSON Facet Analytics API in Solr 5.1

2015-05-10 Thread Frank li
Thank you, Yonik!

Looks cool to me. Only problem is it is not working for me.
I see you have cats and cat in your URL. cat must be a field name.
What is cats?

We are doing a POC with facet count ascending. You help is really important
to us.



On Sat, May 9, 2015 at 8:05 AM, Yonik Seeley ysee...@gmail.com wrote:

 curl -g 
 http://localhost:8983/solr/techproducts/query?q=*:*json.facet={cats:{terms:{field:cat,sort:'count+asc'}}}
 

 Using curl with everything in the URL is definitely trickier.
 Everything needs to be URL escaped.  If it's not, curl will often
 silently do nothing.
 For example, when I had sort:'count asc' , the command above would do
 nothing.  When I remembered to URL encode the space as a +, it
 started working.

 It's definitely easier to use -d with curl...

 curl  http://localhost:8983/solr/techproducts/query; -d
 'q=*:*json.facet={cats:{terms:{field:cat,sort:count asc}}}'

 That also allows you to format it nicer for reading as well:

 curl  http://localhost:8983/solr/techproducts/query; -d
 'q=*:*json.facet=
 {cats:{terms:{
   field:cat,
   sort:count asc
 }}}'

 -Yonik


 On Thu, May 7, 2015 at 5:32 PM, Frank li fudon...@gmail.com wrote:
  This one does not have problem, but how do I include sort in this facet
  query. Basically, I want to write a solr query which can sort the facet
  count ascending. Something like http://localhost:8983/solr
  /demo/query?q=applejson.facet={field=price sort='count asc'}
  
 http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D
 
 
  I really appreciate your help.
 
  Frank
 
  
 http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D
 
 
  On Thu, May 7, 2015 at 2:24 PM, Yonik Seeley ysee...@gmail.com wrote:
 
  On Thu, May 7, 2015 at 4:47 PM, Frank li fudon...@gmail.com wrote:
   Hi Yonik,
  
   I am reading your blog. It is helpful. One question for you, for
  following
   example,
  
   curl http://localhost:8983/solr/query -d 'q=*:*rows=0
json.facet={
  categories:{
type : terms,
field : cat,
sort : { x : desc},
facet:{
  x : avg(price),
  y : sum(price)
}
  }
}
   '
  
  
   If I want to write it in the format of this:
  
 
 http://localhost:8983/solr/query?q=applejson.facet={x:'avg(campaign_ult_defendant_cnt_is)
 '}
  ,
   how do I do?
 
  What problems do you encounter when you try that?
 
  If you try that URL with curl, be aware that curly braces {} are
  special globbing characters in curl.  Turn them off with the -g
  option:
 
  curl -g 
 
 http://localhost:8983/solr/demo/query?q=applejson.facet={x:'avg(price)'}
 
  -Yonik
 



Re: JSON Facet Analytics API in Solr 5.1

2015-05-10 Thread Frank li
Here is our SOLR query:

http://qa-solr:8080/solr/select?q=type:PortalCasejson.facet={categories:{terms:{field:campaign_id_ls,sort:%27count+asc%27}}}rows=0

I replaced cats with categories. It is still not working.

On Sun, May 10, 2015 at 12:10 AM, Frank li fudon...@gmail.com wrote:

 Thank you, Yonik!

 Looks cool to me. Only problem is it is not working for me.
 I see you have cats and cat in your URL. cat must be a field name.
 What is cats?

 We are doing a POC with facet count ascending. You help is really
 important to us.



 On Sat, May 9, 2015 at 8:05 AM, Yonik Seeley ysee...@gmail.com wrote:

 curl -g 
 http://localhost:8983/solr/techproducts/query?q=*:*json.facet={cats:{terms:{field:cat,sort:'count+asc'}}}
 http://localhost:8983/solr/techproducts/query?q=*:*json.facet=%7Bcats:%7Bterms:%7Bfield:cat,sort:'count+asc'%7D%7D%7D
 

 Using curl with everything in the URL is definitely trickier.
 Everything needs to be URL escaped.  If it's not, curl will often
 silently do nothing.
 For example, when I had sort:'count asc' , the command above would do
 nothing.  When I remembered to URL encode the space as a +, it
 started working.

 It's definitely easier to use -d with curl...

 curl  http://localhost:8983/solr/techproducts/query; -d
 'q=*:*json.facet={cats:{terms:{field:cat,sort:count asc}}}'

 That also allows you to format it nicer for reading as well:

 curl  http://localhost:8983/solr/techproducts/query; -d
 'q=*:*json.facet=
 {cats:{terms:{
   field:cat,
   sort:count asc
 }}}'

 -Yonik


 On Thu, May 7, 2015 at 5:32 PM, Frank li fudon...@gmail.com wrote:
  This one does not have problem, but how do I include sort in this
 facet
  query. Basically, I want to write a solr query which can sort the facet
  count ascending. Something like http://localhost:8983/solr
  /demo/query?q=applejson.facet={field=price sort='count asc'}
  
 http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D
 
 
  I really appreciate your help.
 
  Frank
 
  
 http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D
 
 
  On Thu, May 7, 2015 at 2:24 PM, Yonik Seeley ysee...@gmail.com wrote:
 
  On Thu, May 7, 2015 at 4:47 PM, Frank li fudon...@gmail.com wrote:
   Hi Yonik,
  
   I am reading your blog. It is helpful. One question for you, for
  following
   example,
  
   curl http://localhost:8983/solr/query -d 'q=*:*rows=0
json.facet={
  categories:{
type : terms,
field : cat,
sort : { x : desc},
facet:{
  x : avg(price),
  y : sum(price)
}
  }
}
   '
  
  
   If I want to write it in the format of this:
  
 
 http://localhost:8983/solr/query?q=applejson.facet={x:'avg(campaign_ult_defendant_cnt_is)
 '}
  ,
   how do I do?
 
  What problems do you encounter when you try that?
 
  If you try that URL with curl, be aware that curly braces {} are
  special globbing characters in curl.  Turn them off with the -g
  option:
 
  curl -g 
 
 http://localhost:8983/solr/demo/query?q=applejson.facet={x:'avg(price)'}
 
 
  -Yonik
 





Re: JSON Facet Analytics API in Solr 5.1

2015-05-10 Thread Frank li
I figured it out now. It works. cats just a name, right? It does not
matter what is used.

Really appreciate your help. This is going to be really useful. I meant
json.facet.

On Sun, May 10, 2015 at 12:13 AM, Frank li fudon...@gmail.com wrote:

 Here is our SOLR query:


 http://qa-solr:8080/solr/select?q=type:PortalCasejson.facet={categories:{terms:{field:campaign_id_ls,sort:%27count+asc%27}}}rows=0
 http://qa-solr:8080/solr/select?q=type:PortalCasejson.facet=%7Bcategories:%7Bterms:%7Bfield:campaign_id_ls,sort:%27count+asc%27%7D%7D%7Drows=0

 I replaced cats with categories. It is still not working.

 On Sun, May 10, 2015 at 12:10 AM, Frank li fudon...@gmail.com wrote:

 Thank you, Yonik!

 Looks cool to me. Only problem is it is not working for me.
 I see you have cats and cat in your URL. cat must be a field name.
 What is cats?

 We are doing a POC with facet count ascending. You help is really
 important to us.



 On Sat, May 9, 2015 at 8:05 AM, Yonik Seeley ysee...@gmail.com wrote:

 curl -g 
 http://localhost:8983/solr/techproducts/query?q=*:*json.facet={cats:{terms:{field:cat,sort:'count+asc'}}}
 http://localhost:8983/solr/techproducts/query?q=*:*json.facet=%7Bcats:%7Bterms:%7Bfield:cat,sort:'count+asc'%7D%7D%7D
 

 Using curl with everything in the URL is definitely trickier.
 Everything needs to be URL escaped.  If it's not, curl will often
 silently do nothing.
 For example, when I had sort:'count asc' , the command above would do
 nothing.  When I remembered to URL encode the space as a +, it
 started working.

 It's definitely easier to use -d with curl...

 curl  http://localhost:8983/solr/techproducts/query; -d
 'q=*:*json.facet={cats:{terms:{field:cat,sort:count asc}}}'

 That also allows you to format it nicer for reading as well:

 curl  http://localhost:8983/solr/techproducts/query; -d
 'q=*:*json.facet=
 {cats:{terms:{
   field:cat,
   sort:count asc
 }}}'

 -Yonik


 On Thu, May 7, 2015 at 5:32 PM, Frank li fudon...@gmail.com wrote:
  This one does not have problem, but how do I include sort in this
 facet
  query. Basically, I want to write a solr query which can sort the facet
  count ascending. Something like http://localhost:8983/solr
  /demo/query?q=applejson.facet={field=price sort='count asc'}
  
 http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D
 
 
  I really appreciate your help.
 
  Frank
 
  
 http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D
 
 
  On Thu, May 7, 2015 at 2:24 PM, Yonik Seeley ysee...@gmail.com
 wrote:
 
  On Thu, May 7, 2015 at 4:47 PM, Frank li fudon...@gmail.com wrote:
   Hi Yonik,
  
   I am reading your blog. It is helpful. One question for you, for
  following
   example,
  
   curl http://localhost:8983/solr/query -d 'q=*:*rows=0
json.facet={
  categories:{
type : terms,
field : cat,
sort : { x : desc},
facet:{
  x : avg(price),
  y : sum(price)
}
  }
}
   '
  
  
   If I want to write it in the format of this:
  
 
 http://localhost:8983/solr/query?q=applejson.facet={x:'avg(campaign_ult_defendant_cnt_is)
 '}
  ,
   how do I do?
 
  What problems do you encounter when you try that?
 
  If you try that URL with curl, be aware that curly braces {} are
  special globbing characters in curl.  Turn them off with the -g
  option:
 
  curl -g 
 
 http://localhost:8983/solr/demo/query?q=applejson.facet={x:'avg(price)'}
 
 
  -Yonik
 






Re: JSON Facet Analytics API in Solr 5.1

2015-05-08 Thread Frank li
Hi Yonik,

Any update for the question?

Thanks in advance,

Frank

On Thu, May 7, 2015 at 2:49 PM, Frank li fudon...@gmail.com wrote:

 Is there any book to read so I won't ask such dummy questions? Thanks.

 On Thu, May 7, 2015 at 2:32 PM, Frank li fudon...@gmail.com wrote:

 This one does not have problem, but how do I include sort in this facet
 query. Basically, I want to write a solr query which can sort the facet
 count ascending. Something like http://localhost:8983/solr
 /demo/query?q=applejson.facet={field=price sort='count asc'}
 http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D

 I really appreciate your help.

 Frank


 http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D

 On Thu, May 7, 2015 at 2:24 PM, Yonik Seeley ysee...@gmail.com wrote:

 On Thu, May 7, 2015 at 4:47 PM, Frank li fudon...@gmail.com wrote:
  Hi Yonik,
 
  I am reading your blog. It is helpful. One question for you, for
 following
  example,
 
  curl http://localhost:8983/solr/query -d 'q=*:*rows=0
   json.facet={
 categories:{
   type : terms,
   field : cat,
   sort : { x : desc},
   facet:{
 x : avg(price),
 y : sum(price)
   }
 }
   }
  '
 
 
  If I want to write it in the format of this:
 
 http://localhost:8983/solr/query?q=applejson.facet={x:'avg(campaign_ult_defendant_cnt_is)'}
 ,
  how do I do?

 What problems do you encounter when you try that?

 If you try that URL with curl, be aware that curly braces {} are
 special globbing characters in curl.  Turn them off with the -g
 option:

 curl -g 
 http://localhost:8983/solr/demo/query?q=applejson.facet={x:'avg(price)'}
 

 -Yonik






Re: JSON Facet Analytics API in Solr 5.1

2015-05-07 Thread Frank li
Hi Yonik,

I am reading your blog. It is helpful. One question for you, for following
example,

curl http://localhost:8983/solr/query -d 'q=*:*rows=0
 json.facet={
   categories:{
 type : terms,
 field : cat,
 sort : { x : desc},
 facet:{
   x : avg(price),
   y : sum(price)
 }
   }
 }
'


If I want to write it in the format of this:
http://localhost:8983/solr/query?q=applejson.facet={x:'avg(campaign_ult_defendant_cnt_is)'},
how do I do?

Thanks,

Frank


On Mon, Apr 20, 2015 at 7:35 AM, Davis, Daniel (NIH/NLM) [C] 
daniel.da...@nih.gov wrote:

 Indeed - XML is not human readable if it contains colons, JSON is not
 human readable if it is too deep, and the objects/keys are not semantic.
 I also vote for flatter.

 -Original Message-
 From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
 Sent: Friday, April 17, 2015 11:16 PM
 To: solr-user@lucene.apache.org
 Subject: Re: JSON Facet  Analytics API in Solr 5.1

 Flatter please.  The other nested stuff makes my head hurt.  Until
 recently I thought I was the only person on the planet who had a hard time
 mentally parsing anything but the simplest JSON, but then I learned that
 I'm not alone at all it's just that nobody is saying it. :)

 Otis
 --
 Monitoring * Alerting * Anomaly Detection * Centralized Log Management
 Solr  Elasticsearch Support * http://sematext.com/



 On Fri, Apr 17, 2015 at 7:26 PM, Trey Grainger solrt...@gmail.com wrote:

  Agreed, I also prefer the second way. I find it more readible, less
  verbose while communicating the same information, less confusing to
  mentally parse (is 'terms' the name of my facet, or the type of my
  facet?...), and less prone to syntactlcally valid, but logically
  invalid inputs.  Let's break those topics down.
 
  *1) Less verbose while communicating the same information:* The
  flatter structure is particularly useful when you have nested facets
  to reduce unnecessary verbosity / extra levels. Let's contrast the two
  approaches with just 2 levels of subfacets:
 
  ** Current Format **
  top_genres:{
  terms:{
  field: genre,
  limit: 5,
  facet:{
  top_authors:{
  terms:{
  field: author,
  limit: 4,
  facet: {
  top_books:{
  terms:{
  field: title,
  limit: 5
 }
 }
  }
  }
  }
  }
  }
  }
 
  ** Flat Format **
  top_genres:{
  type: terms,
  field: genre,
  limit: 5,
  facet:{
  top_authors:{
  type: terms
  field: author,
  limit: 4,
  facet: {
  top_books:{
  type: terms
  field: title,
  limit: 5
 }
  }
  }
  }
  }
 
  The flat format is clearly shorter and more succinct, while
  communicating the same information. What value do the extra levels add?
 
 
  *2) Less confusing to mentally parse*
  I also find the flatter structure less confusing, as I'm consistently
  having to take a mental pause with the current format to verify
  whether terms is the name of my facet or the type of my facet and
  have to count the curly braces to figure this out.  Not that I would
  name my facets like this, but to give an extreme example of why that
  extra mental calculation is necessary due to the name of an attribute
  in the structure being able to represent both a facet name and facet
 type:
 
  terms: {
  terms: {
  field: genre,
  limit: 5,
  facet: {
  terms: {
  terms:{
  field: author
  limit: 4
  }
  }
  }
  }
  }
 
  In this example, the first terms is a facet name, the second terms
  is a facet type, the third is a facet name, etc. Even if you don't
  name your facets like this, it still requires parsing someone else's
  query mentally to ensure that's not what was done.
 
  3) *Less prone to syntactically valid, but logically invalid inputs*
  Also, given this first format (where the type is indicated by one of
  several possible attributes: terms, range, etc.), what happens if I
  pass in multiple of the valid JSON attributes... the flatter structure
  prevents this from being possible (which is a good thing!):
 
  top_authors : {
  terms : {
  field : author,
  limit : 5
  },
  range : {
  field : price,
  start : 0,
  end : 100,
  gap : 20
  }
  }
 
  I don't think the response format can currently handle this without
  adding in extra levels to make it look like the input side, so this is
  an exception case even thought it seems 

Re: JSON Facet Analytics API in Solr 5.1

2015-05-07 Thread Frank li
This one does not have problem, but how do I include sort in this facet
query. Basically, I want to write a solr query which can sort the facet
count ascending. Something like http://localhost:8983/solr
/demo/query?q=applejson.facet={field=price sort='count asc'}
http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D

I really appreciate your help.

Frank

http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D

On Thu, May 7, 2015 at 2:24 PM, Yonik Seeley ysee...@gmail.com wrote:

 On Thu, May 7, 2015 at 4:47 PM, Frank li fudon...@gmail.com wrote:
  Hi Yonik,
 
  I am reading your blog. It is helpful. One question for you, for
 following
  example,
 
  curl http://localhost:8983/solr/query -d 'q=*:*rows=0
   json.facet={
 categories:{
   type : terms,
   field : cat,
   sort : { x : desc},
   facet:{
 x : avg(price),
 y : sum(price)
   }
 }
   }
  '
 
 
  If I want to write it in the format of this:
 
 http://localhost:8983/solr/query?q=applejson.facet={x:'avg(campaign_ult_defendant_cnt_is)'}
 ,
  how do I do?

 What problems do you encounter when you try that?

 If you try that URL with curl, be aware that curly braces {} are
 special globbing characters in curl.  Turn them off with the -g
 option:

 curl -g 
 http://localhost:8983/solr/demo/query?q=applejson.facet={x:'avg(price)'}

 -Yonik



Re: JSON Facet Analytics API in Solr 5.1

2015-05-07 Thread Frank li
Is there any book to read so I won't ask such dummy questions? Thanks.

On Thu, May 7, 2015 at 2:32 PM, Frank li fudon...@gmail.com wrote:

 This one does not have problem, but how do I include sort in this facet
 query. Basically, I want to write a solr query which can sort the facet
 count ascending. Something like http://localhost:8983/solr
 /demo/query?q=applejson.facet={field=price sort='count asc'}
 http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D

 I really appreciate your help.

 Frank


 http://localhost:8983/solr/demo/query?q=applejson.facet=%7Bx:%27avg%28price%29%27%7D

 On Thu, May 7, 2015 at 2:24 PM, Yonik Seeley ysee...@gmail.com wrote:

 On Thu, May 7, 2015 at 4:47 PM, Frank li fudon...@gmail.com wrote:
  Hi Yonik,
 
  I am reading your blog. It is helpful. One question for you, for
 following
  example,
 
  curl http://localhost:8983/solr/query -d 'q=*:*rows=0
   json.facet={
 categories:{
   type : terms,
   field : cat,
   sort : { x : desc},
   facet:{
 x : avg(price),
 y : sum(price)
   }
 }
   }
  '
 
 
  If I want to write it in the format of this:
 
 http://localhost:8983/solr/query?q=applejson.facet={x:'avg(campaign_ult_defendant_cnt_is)'}
 ,
  how do I do?

 What problems do you encounter when you try that?

 If you try that URL with curl, be aware that curly braces {} are
 special globbing characters in curl.  Turn them off with the -g
 option:

 curl -g 
 http://localhost:8983/solr/demo/query?q=applejson.facet={x:'avg(price)'}
 

 -Yonik





Why are these two queries different?

2015-04-27 Thread Frank li
We did two SOLR qeries and they supposed to return the same results but
didnot:

Query 1: all_text:(US 4,568,649 A)

parsedquery: (+((all_text:us ((all_text:4 all_text:568 all_text:649
all_text:4568649)~4))~2))/no_coord,

Result: numFound: 0,

Query 2: all_text:(US 4568649)

parsedquery: (+((all_text:us all_text:4568649)~2))/no_coord,


Result: numFound: 2,


We assumed the two return the same result. Our default operator is AND.


Re: Config join parse in solrconfig.xml

2015-04-07 Thread Frank li
Cool. It actually works after I removed those extra columns. Thanks for
your help.

On Mon, Apr 6, 2015 at 8:19 PM, Erick Erickson erickerick...@gmail.com
wrote:

 df does not allow multiple fields, it stands for default field, not
 default fields. To get what you're looking for, you need to use
 edismax or explicitly create the multiple clauses.

 I'm not quite sure what the join parser is doing with the df
 parameter. So my first question is what happens if you just use a
 single field for df?.

 Best,
 Erick

 On Mon, Apr 6, 2015 at 11:51 AM, Frank li fudon...@gmail.com wrote:
  The error message was from the query with debug=query.
 
  On Mon, Apr 6, 2015 at 11:49 AM, Frank li fudon...@gmail.com wrote:
 
  Hi Erick,
 
 
  Thanks for your response.
 
  Here is the query I am sending:
 
 
 http://dev-solr:8080/solr/collection1/select?q={!join+from=litigation_id_ls+to=lit_id_lms}all_text:applefq=type:PartyLawyerLawfirmfacet=truefacet.field=lawyer_id_lmsfacet.mincount=1rows=0
  
 http://dev-solr:8080/solr/collection1/select?q=%7B!join+from=litigation_id_ls+to=lit_id_lms%7Dall_text:applefq=type:PartyLawyerLawfirmfacet=truefacet.field=lawyer_id_lmsfacet.mincount=1rows=0
 
 
  You can see it has all_text:apple. I added field name all_text,
  because it gives error without it.
 
  Errors:
 
  lst name=errorstr name=msgundefined field all_text number party
  name all_code ent_name/strint name=code400/int/lst
 
 
  These fields are defined as the default search fields in our
  solr_config.xml file:
 
  str name=dfall_text number party name all_code ent_name/str
 
 
  Thanks,
 
  Fudong
 
  On Fri, Apr 3, 2015 at 1:31 PM, Erick Erickson erickerick...@gmail.com
 
  wrote:
 
  You have to show us several more things:
 
  1 what exactly does the query look like?
  2 what do you expect?
  3 output when you specify debug=query
  4 anything else that would help. You might review:
 
  http://wiki.apache.org/solr/UsingMailingLists
 
  Best,
  Erick
 
  On Fri, Apr 3, 2015 at 10:58 AM, Frank li fudon...@gmail.com wrote:
   Hi,
  
   I am starting using join parser with our solr. We have some default
  fields.
   They are defined in solrconfig.xml:
  
 lst name=defaults
  str name=defTypeedismax/str
  str name=echoParamsexplicit/str
  int name=rows10/int
  str name=dfall_text number party name all_code
 ent_name/str
  str name=qfall_text number^3 name^5 party^3 all_code^2
   ent_name^7/str
  str name=flid description market_sector_type parent
  ult_parent
   ent_name title patent_title *_ls *_lms *_is *_texts *_ac *_as *_s
 *_ss
  *_ds
   *_sms *_ss *_bs/str
  str name=q.opAND/str
/lst
  
  
   I found out once I use join parser, it does not recognize the default
   fields any more. How do I modify the configuration for this?
  
   Thanks,
  
   Fred
 
 
 



Re: Config join parse in solrconfig.xml

2015-04-06 Thread Frank li
Hi Erick,


Thanks for your response.

Here is the query I am sending:
http://dev-solr:8080/solr/collection1/select?q={!join+from=litigation_id_ls+to=lit_id_lms}all_text:applefq=type:PartyLawyerLawfirmfacet=truefacet.field=lawyer_id_lmsfacet.mincount=1rows=0

You can see it has all_text:apple. I added field name all_text, because
it gives error without it.

Errors:

lst name=errorstr name=msgundefined field all_text number party
name all_code ent_name/strint name=code400/int/lst


These fields are defined as the default search fields in our
solr_config.xml file:

str name=dfall_text number party name all_code ent_name/str


Thanks,

Fudong

On Fri, Apr 3, 2015 at 1:31 PM, Erick Erickson erickerick...@gmail.com
wrote:

 You have to show us several more things:

 1 what exactly does the query look like?
 2 what do you expect?
 3 output when you specify debug=query
 4 anything else that would help. You might review:

 http://wiki.apache.org/solr/UsingMailingLists

 Best,
 Erick

 On Fri, Apr 3, 2015 at 10:58 AM, Frank li fudon...@gmail.com wrote:
  Hi,
 
  I am starting using join parser with our solr. We have some default
 fields.
  They are defined in solrconfig.xml:
 
lst name=defaults
 str name=defTypeedismax/str
 str name=echoParamsexplicit/str
 int name=rows10/int
 str name=dfall_text number party name all_code ent_name/str
 str name=qfall_text number^3 name^5 party^3 all_code^2
  ent_name^7/str
 str name=flid description market_sector_type parent ult_parent
  ent_name title patent_title *_ls *_lms *_is *_texts *_ac *_as *_s *_ss
 *_ds
  *_sms *_ss *_bs/str
 str name=q.opAND/str
   /lst
 
 
  I found out once I use join parser, it does not recognize the default
  fields any more. How do I modify the configuration for this?
 
  Thanks,
 
  Fred



Re: Config join parse in solrconfig.xml

2015-04-06 Thread Frank li
The error message was from the query with debug=query.

On Mon, Apr 6, 2015 at 11:49 AM, Frank li fudon...@gmail.com wrote:

 Hi Erick,


 Thanks for your response.

 Here is the query I am sending:

 http://dev-solr:8080/solr/collection1/select?q={!join+from=litigation_id_ls+to=lit_id_lms}all_text:applefq=type:PartyLawyerLawfirmfacet=truefacet.field=lawyer_id_lmsfacet.mincount=1rows=0
 http://dev-solr:8080/solr/collection1/select?q=%7B!join+from=litigation_id_ls+to=lit_id_lms%7Dall_text:applefq=type:PartyLawyerLawfirmfacet=truefacet.field=lawyer_id_lmsfacet.mincount=1rows=0

 You can see it has all_text:apple. I added field name all_text,
 because it gives error without it.

 Errors:

 lst name=errorstr name=msgundefined field all_text number party
 name all_code ent_name/strint name=code400/int/lst


 These fields are defined as the default search fields in our
 solr_config.xml file:

 str name=dfall_text number party name all_code ent_name/str


 Thanks,

 Fudong

 On Fri, Apr 3, 2015 at 1:31 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 You have to show us several more things:

 1 what exactly does the query look like?
 2 what do you expect?
 3 output when you specify debug=query
 4 anything else that would help. You might review:

 http://wiki.apache.org/solr/UsingMailingLists

 Best,
 Erick

 On Fri, Apr 3, 2015 at 10:58 AM, Frank li fudon...@gmail.com wrote:
  Hi,
 
  I am starting using join parser with our solr. We have some default
 fields.
  They are defined in solrconfig.xml:
 
lst name=defaults
 str name=defTypeedismax/str
 str name=echoParamsexplicit/str
 int name=rows10/int
 str name=dfall_text number party name all_code ent_name/str
 str name=qfall_text number^3 name^5 party^3 all_code^2
  ent_name^7/str
 str name=flid description market_sector_type parent
 ult_parent
  ent_name title patent_title *_ls *_lms *_is *_texts *_ac *_as *_s *_ss
 *_ds
  *_sms *_ss *_bs/str
 str name=q.opAND/str
   /lst
 
 
  I found out once I use join parser, it does not recognize the default
  fields any more. How do I modify the configuration for this?
 
  Thanks,
 
  Fred





Config join parse in solrconfig.xml

2015-04-03 Thread Frank li
Hi,

I am starting using join parser with our solr. We have some default fields.
They are defined in solrconfig.xml:

  lst name=defaults
   str name=defTypeedismax/str
   str name=echoParamsexplicit/str
   int name=rows10/int
   str name=dfall_text number party name all_code ent_name/str
   str name=qfall_text number^3 name^5 party^3 all_code^2
ent_name^7/str
   str name=flid description market_sector_type parent ult_parent
ent_name title patent_title *_ls *_lms *_is *_texts *_ac *_as *_s *_ss *_ds
*_sms *_ss *_bs/str
   str name=q.opAND/str
 /lst


I found out once I use join parser, it does not recognize the default
fields any more. How do I modify the configuration for this?

Thanks,

Fred


sort and group.sort

2014-11-19 Thread Frank li
We have a query which has both sort and group.sort. What we are expecting
is that we can use sort to sort groups but inside the group we have a
different sort.

However, looks like sort is over-writting the sorting order inside groups.

Can any one of you help us on this?

Basically we want to sort the groups in one way but sort inside the group
in another way.

Thanks,

Fudong


Re: Solr add document over 20 times slower after upgrade from 4.0 to 4.9

2014-09-05 Thread Li, Ryan
HI Shawn,

Thanks for your reply.

The memory setting of my Solr box is

12G physically memory.
4G for java (-Xmx4096m)
The index size is around 4G in Solr 4.9, I think it was over 6G in Solr 4.0.

I do think the RAM size of java is one of the reasons for this slowness. I'm 
doing one big commit and when the ingestion process finished 50%, I can see the 
solr server already used over 90% of full memory.

I'll try to assign more RAM to Solr Java. But from your experience, does 4G 
sounds like a good number for Java heap size for my scenario? Is there any way 
to reduce memory usage during index time? (One thing I know is do a few commits 
instead of one commit. )  My concern is providing I have 12 G in total, If I 
assign too much to Solr server, I may not have enough for the OS to cache Solr 
index file.

I had a look to solr config file, but couldn't find anything that obviously 
wrong, Just wondering which part of that config file would impact the index 
time?

Thanks,
Ryan





One possible source of problems with that particular upgrade is the fact
that stored field compression was added in 4.1, and termvector
compression was added in 4.2.  They are on by default and cannot be
turned off.  The compression is typically fast, but with very large
documents like yours, it might result in pretty major computational
overhead.  It can also require additional java heap, which ties into
what follows:

Another problem might be RAM-related.

If your java heap is very large, or just a little bit too small, there
can be major performance issues from garbage collection.  Based on the
fact that the earlier version performed well, a too-small heap is more
likely than a very large heap.

If your index size is such that it can't be effectively cached by the
amount of total RAM on the machine (minus the java heap assigned to
Solr), that can cause performance problems.  Your index size is likely
to be several gigabytes, and might even reach double-digit gigabytes.
Can you relate those numbers -- index size, java heap size, and total
system RAM?  If you can, it would also be a good idea to share your
solrconfig.xml.

Here's a wiki page that goes into more detail about possible performance
issues.  It doesn't mention the possible compression problem:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn


RE: Solr add document over 20 times slower after upgrade from 4.0 to 4.9

2014-09-05 Thread Li, Ryan
Hi Erick,

As Ryan Ernst noticed, those big fields (eg majorTextSignalStem)  is not 
stored. There are a few stored fields in my schema, but they are very small 
fields basically name or id for that document.  I tried turn them off(only 
store id filed) and that didn't make any difference.

Thanks,
Ryan

Ryan:

As it happens, there's a discssion on the dev list about this.

If at all possible, could you try a brief experiment? Turn off
all the storage, i.e. set stored=false on all fields. It's a lot
to ask, but it'd help the discussion.

Or join the discussion at https://issues.apache.org/jira/browse/LUCENE-5914.

Best,
Erick


From: Li, Ryan
Sent: Friday, September 05, 2014 3:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr add document over 20 times slower after upgrade from 4.0 to 
4.9


HI Shawn,

Thanks for your reply.

The memory setting of my Solr box is

12G physically memory.
4G for java (-Xmx4096m)
The index size is around 4G in Solr 4.9, I think it was over 6G in Solr 4.0.

I do think the RAM size of java is one of the reasons for this slowness. I'm 
doing one big commit and when the ingestion process finished 50%, I can see the 
solr server already used over 90% of full memory.

I'll try to assign more RAM to Solr Java. But from your experience, does 4G 
sounds like a good number for Java heap size for my scenario? Is there any way 
to reduce memory usage during index time? (One thing I know is do a few commits 
instead of one commit. )  My concern is providing I have 12 G in total, If I 
assign too much to Solr server, I may not have enough for the OS to cache Solr 
index file.

I had a look to solr config file, but couldn't find anything that obviously 
wrong, Just wondering which part of that config file would impact the index 
time?

Thanks,
Ryan





One possible source of problems with that particular upgrade is the fact
that stored field compression was added in 4.1, and termvector
compression was added in 4.2.  They are on by default and cannot be
turned off.  The compression is typically fast, but with very large
documents like yours, it might result in pretty major computational
overhead.  It can also require additional java heap, which ties into
what follows:

Another problem might be RAM-related.

If your java heap is very large, or just a little bit too small, there
can be major performance issues from garbage collection.  Based on the
fact that the earlier version performed well, a too-small heap is more
likely than a very large heap.

If your index size is such that it can't be effectively cached by the
amount of total RAM on the machine (minus the java heap assigned to
Solr), that can cause performance problems.  Your index size is likely
to be several gigabytes, and might even reach double-digit gigabytes.
Can you relate those numbers -- index size, java heap size, and total
system RAM?  If you can, it would also be a good idea to share your
solrconfig.xml.

Here's a wiki page that goes into more detail about possible performance
issues.  It doesn't mention the possible compression problem:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn


Re: Solr add document over 20 times slower after upgrade from 4.0 to 4.9

2014-09-05 Thread Li, Ryan
Hi Guys,

Just some update.

I've tried with Solr 4.10 (same code for Solr 4.9). And that has the same index 
speed as 4.0. The only problem left now is that Solr 4.10 takes more memory 
than 4.0 so I'm trying to figure out what is the best number for Java heap size.

I think that proves there is some performance issue with Solr 4.9 when index 
big document (even just over 1mb).

Thanks,
Ryan


Solr add document over 20 times slower after upgrade from 4.0 to 4.9

2014-09-03 Thread Li, Ryan
I have a Solr server  indexes 2500 documents (up to 50MB each, ave 3MB) to Solr 
server. When running on Solr 4.0 I managed to finish index in 3 hours.

However after we upgrade to Solr 4.9, the index need 3 days to finish.

I've done some profiling, numbers I get are:
size figure of document,time for adding to Solr server (4.0), time for 
adding to Solr server (4.9)
1.18,   6 sec,  
 123 sec
2.26   12sec
   444 sec
3.35   18sec
   over 600 sec
9.6546sec   
   timeout.

From what I can see index seems has an o(n) performance for Solr 4.0 and is 
almost o(log n) for Solr 4.9. I also tried to comment out some copied fields 
to narrow down the problem, seems size of the document after index(we copy 
fields and the more fields we copy, the bigger the index size is)  is the 
dominating factor for index time.

Just wondering has any one experience similar problem? Does that sound like a 
bug of Solr or just we have use Solr 4.9 wrong?

Here is one example of  field definition in my schema file.
fieldType name=text_stem class=solr.TextField 
positionIncrementGap=100
analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
charFilter class=solr.PatternReplaceCharFilterFactory 
pattern='+ replacement= / !-- strip off all apostrophe (') characters --
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SynonymFilterFactory expand=true 
ignoreCase=true synonyms=../../resources/type-index-synonyms.txt/
filter class=solr.SnowballPorterFilterFactory 
language=English /
!-- Used to have  language=English - seems this param is 
gone in 4.9 --
filter class=solr.RemoveDuplicatesTokenFilterFactory /
/analyzer
analyzer type=query
charFilter class=solr.HTMLStripCharFilterFactory/
charFilter class=solr.PatternReplaceCharFilterFactory 
pattern='+ replacement= / !-- strip off all apostrophe (') characters --
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SynonymFilterFactory expand=true 
ignoreCase=true synonyms=../../resources/type-query-colloq-synonyms.txt/
filter class=solr.SnowballPorterFilterFactory 
language=English /
!-- Used to have  language=English - seems this param is 
gone in 4.9 --
filter class=solr.RemoveDuplicatesTokenFilterFactory /
/analyzer
/fieldType
Field:
field name=majorTextSignalStem type=text_stem indexed=true 
stored=false multiValued=true omitNorms=false/
Copy:
 copyField dest=majorTextSignalStem source=majorTextSignalRaw /

Thanks,
Ryan



What is the difference between attorney:(Roger Miller) and attorney:Roger Miller

2013-11-19 Thread fudong li
We got different results for these two queries. The first one returned 115
records and the second returns 179 records.

Thanks,

Fudong


Re: Custom FunctionQuery Guide/Tutorial (4.3.0+) ?

2013-10-21 Thread fudong li
Hi Jack,

Do you have a date for the new version of your book:
solr_4x_deep_dive_early_access?

Thanks,

Fudong


On Mon, Oct 21, 2013 at 10:39 AM, Jack Krupansky j...@basetechnology.comwrote:

 Take a look at the unit tests for various value sources, and find a Jira
 that added some value source and look at the patch for what changes had to
 be made.

 -- Jack Krupansky

 -Original Message- From: JT
 Sent: Monday, October 21, 2013 1:17 PM
 To: solr-user@lucene.apache.org
 Subject: Custom FunctionQuery Guide/Tutorial (4.3.0+) ?


 Does anyone have a good link to a guide / tutorial /etc. for writing a
 custom function query in Solr 4?

 The tutorials I've seen vary from showing half the code to being written
 for older versions of Solr.


 Any type of pointers would be appreciated, thanks.



stats on dynamic fields?

2013-10-08 Thread Li Xu
Hi,

I don't seem to be able to find any info on the possibility to get stats on
dynamic fields. stats=truestates.field=xyz_* appears to literally treat
xyz_* as the field name with a star. Is there a way to get stats on
dynamic fields without explicitly listing them in the query?

Thanks!
Li


RE: How to share config files in SolrCloud between multiple cores(collections)

2013-03-20 Thread Li, Qiang
I just want to share the solrconfig.xml and schema.xml. As there should be 
differences between collections for other files, such as the DIH's 
configurations.

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: Tuesday, March 19, 2013 11:19 AM
To: solr-user@lucene.apache.org
Subject: Re: How to share config files in SolrCloud between multiple 
cores(collections)

To share configs in SolrCloud you just upload a single config set and then link 
it to multiple collections. You don't actually use solr.xml to do it.

- Mark

On Mar 19, 2013, at 10:43 AM, Li, Qiang qiang...@msci.com wrote:

 We have multiple cores with the same configurations, before using SolrCloud, 
 we can use relative path in solr.xml. But with Solr4, is seems denied for 
 using relative path for the schema and config in solr.xml.

 Regards,
 Ivan

 This email message and any attachments are for the sole use of the intended 
 recipients and may contain proprietary and/or confidential information which 
 may be privileged or otherwise protected from disclosure. Any unauthorized 
 review, use, disclosure or distribution is prohibited. If you are not an 
 intended recipient, please contact the sender by reply email and destroy the 
 original message and any copies of the message as well as any attachments to 
 the original message. Local registered entity information: 
 http://www.msci.com/legal/local_registered_entities.html


This email message and any attachments are for the sole use of the intended 
recipients and may contain proprietary and/or confidential information which 
may be privileged or otherwise protected from disclosure. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not an 
intended recipient, please contact the sender by reply email and destroy the 
original message and any copies of the message as well as any attachments to 
the original message. Local registered entity information: 
http://www.msci.com/legal/local_registered_entities.html


How to share config files in SolrCloud between multiple cores(collections)

2013-03-19 Thread Li, Qiang
We have multiple cores with the same configurations, before using SolrCloud, we 
can use relative path in solr.xml. But with Solr4, is seems denied for using 
relative path for the schema and config in solr.xml.

Regards,
Ivan

This email message and any attachments are for the sole use of the intended 
recipients and may contain proprietary and/or confidential information which 
may be privileged or otherwise protected from disclosure. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not an 
intended recipient, please contact the sender by reply email and destroy the 
original message and any copies of the message as well as any attachments to 
the original message. Local registered entity information: 
http://www.msci.com/legal/local_registered_entities.html


Re: build CMIS compatible Solr

2013-01-20 Thread Nicholas Li
I think this might be the one you are talking about:
https://github.com/sourcesense/solr-cmis

But I think Alfresco has already had search functionality, similar to Solr.
Then why did you want to use it to index docs out of Alfresco?

On Fri, Jan 18, 2013 at 8:00 PM, Upayavira u...@odoko.co.uk wrote:

 A colleague of mine when I was working for Sourcesense made a CMIS
 plugin for Solr. It was one way, and we used it to index stuff out of
 Alfresco into Solr. I can't search for it now, let me know if you can't
 find it.

 Upayavira

 On Fri, Jan 18, 2013, at 05:35 AM, Nicholas Li wrote:
  I want to make something like Alfresco, but not having that many
  features.
  And I'd like to utilise the searching ability of Solr.
 
  On Fri, Jan 18, 2013 at 4:11 PM, Gora Mohanty g...@mimirtech.com
 wrote:
 
   On 18 January 2013 10:36, Nicholas Li nicholas...@yarris.com wrote:
hi
   
I am new to solr and I would like to use Solr as my document server,
 plus
search engine. But solr is not CMIS compatible( While it shoud not
 be, as
it is not build as a pure document management server).  In that
 sense, I
would build another layer beyond Solr so that the exposed interface
 would
be CMIS compatible.
   [...]
  
   May I ask why? Solr is designed to be a search engine,
   which is a very different beast from a document repository.
   In the open-source world, Alfresco ( http://www.alfresco.com/ )
   already exists, can index into Solr, and supports CMIS-based
   access.
  
   Regards,
   Gora
  



build CMIS compatible Solr

2013-01-17 Thread Nicholas Li
hi

I am new to solr and I would like to use Solr as my document server, plus
search engine. But solr is not CMIS compatible( While it shoud not be, as
it is not build as a pure document management server).  In that sense, I
would build another layer beyond Solr so that the exposed interface would
be CMIS compatible.

I did some investigation and looks like OpenCMIS is one of the choices. My
next step would be build this CMIS Bridge layer, which can marshall the
request as CMIS request, then within the CMIS implementation, marshall the
requst as Solr compatible request and send it to Solr. Finally marshall the
Solr response to CMIS compatible response.

Is my logic right?

And, is that any other library other than OpenCMIS to do this job?

cheers.
Nick


Re: build CMIS compatible Solr

2013-01-17 Thread Nicholas Li
I want to make something like Alfresco, but not having that many features.
And I'd like to utilise the searching ability of Solr.

On Fri, Jan 18, 2013 at 4:11 PM, Gora Mohanty g...@mimirtech.com wrote:

 On 18 January 2013 10:36, Nicholas Li nicholas...@yarris.com wrote:
  hi
 
  I am new to solr and I would like to use Solr as my document server, plus
  search engine. But solr is not CMIS compatible( While it shoud not be, as
  it is not build as a pure document management server).  In that sense, I
  would build another layer beyond Solr so that the exposed interface would
  be CMIS compatible.
 [...]

 May I ask why? Solr is designed to be a search engine,
 which is a very different beast from a document repository.
 In the open-source world, Alfresco ( http://www.alfresco.com/ )
 already exists, can index into Solr, and supports CMIS-based
 access.

 Regards,
 Gora



Store document while using Solr

2012-12-20 Thread Nicholas Li
hi there,

I am quite new to Solr and have a very basic question about storing and
indexing the document.

I am trying with the Solr example, and when I run command like 'java -jar
post.jar foo/test.xml', it gives me the feeling that solr will index the
given file, no matter where it is store, and solr won't re-store this file
to some other location in the file system.  Am I correct?

If I want use file system to manage the document, it seem like it is better
to define some location, which will be used to store all the potential
files(It may need some processing to move/copy/upload the files to this
location), then use solr to index them under this location. Am I correct?

Cheers,
Nick


Index version generation for Solr 3.5

2012-08-22 Thread Xin Li
Hi,

I ran into an issue lately with Index version  generation for Solr 3.5.

In Solr 1.4., the index version of slave service increments upon each
replication. However, I noticed it's not the case for Solr 3.5; the
index version would increase 20, or 30 after replication. Does anyone
know why and any reference on the web for this?
The index generation does still increment after replication though.

Thanks,

Xin


Re: Atomic Multicore Operations - E.G. Move Docs

2012-08-15 Thread Li Li
在 2012-7-2 傍晚6:37,Nicholas Ball nicholas.b...@nodelay.com写道:


 That could work, but then how do you ensure commit is called on the two
 cores at the exact same time?
that may needs something like two phrase commit in relational dB. lucene
has prepareCommit, but to implement 2pc, many things need to do.
 Also, any way to commit a specific update rather then all the back-logged
 ones?

 Cheers,
 Nicholas

 On Sat, 30 Jun 2012 16:19:31 -0700, Lance Norskog goks...@gmail.com
 wrote:
  Index all documents to both cores, but do not call commit until both
  report that indexing worked. If one of the cores throws an exception,
  call roll back on both cores.
 
  On Sat, Jun 30, 2012 at 6:50 AM, Nicholas Ball
  nicholas.b...@nodelay.com wrote:
 
  Hey all,
 
  Trying to figure out the best way to perform atomic operation across
  multiple cores on the same solr instance i.e. a multi-core environment.
 
  An example would be to move a set of docs from one core onto another
 core
  and ensure that a softcommit is done as the exact same time. If one
 were
  to
  fail so would the other.
  Obviously this would probably require some customization but wanted to
  know what the best way to tackle this would be and where should I be
  looking in the source.
 
  Many thanks for the help in advance,
  Nicholas a.k.a. incunix


Re: Atomic Multicore Operations - E.G. Move Docs

2012-08-15 Thread Li Li
do you really need this?
distributed transaction is a difficult problem. in 2pc, every node could
fail, including coordinator. something like leader election needed to make
sure it works. you maybe try zookeeper.
but if the transaction is not very very important like transfer money in
bank, you can do like this.
coordinator:
在 2012-8-16 上午7:42,Nicholas Ball nicholas.b...@nodelay.com写道:


 Haven't managed to find a good way to do this yet. Does anyone have any
 ideas on how I could implement this feature?
 Really need to move docs across from one core to another atomically.

 Many thanks,
 Nicholas

 On Mon, 02 Jul 2012 04:37:12 -0600, Nicholas Ball
 nicholas.b...@nodelay.com wrote:
  That could work, but then how do you ensure commit is called on the two
  cores at the exact same time?
 
  Cheers,
  Nicholas
 
  On Sat, 30 Jun 2012 16:19:31 -0700, Lance Norskog goks...@gmail.com
  wrote:
  Index all documents to both cores, but do not call commit until both
  report that indexing worked. If one of the cores throws an exception,
  call roll back on both cores.
 
  On Sat, Jun 30, 2012 at 6:50 AM, Nicholas Ball
  nicholas.b...@nodelay.com wrote:
 
  Hey all,
 
  Trying to figure out the best way to perform atomic operation across
  multiple cores on the same solr instance i.e. a multi-core
 environment.
 
  An example would be to move a set of docs from one core onto another
  core
  and ensure that a softcommit is done as the exact same time. If one
  were
  to
  fail so would the other.
  Obviously this would probably require some customization but wanted to
  know what the best way to tackle this would be and where should I be
  looking in the source.
 
  Many thanks for the help in advance,
  Nicholas a.k.a. incunix



Re: Atomic Multicore Operations - E.G. Move Docs

2012-08-15 Thread Li Li
http://zookeeper.apache.org/doc/r3.3.6/recipes.html#sc_recipes_twoPhasedCommit

On Thu, Aug 16, 2012 at 7:41 AM, Nicholas Ball
nicholas.b...@nodelay.com wrote:

 Haven't managed to find a good way to do this yet. Does anyone have any
 ideas on how I could implement this feature?
 Really need to move docs across from one core to another atomically.

 Many thanks,
 Nicholas

 On Mon, 02 Jul 2012 04:37:12 -0600, Nicholas Ball
 nicholas.b...@nodelay.com wrote:
 That could work, but then how do you ensure commit is called on the two
 cores at the exact same time?

 Cheers,
 Nicholas

 On Sat, 30 Jun 2012 16:19:31 -0700, Lance Norskog goks...@gmail.com
 wrote:
 Index all documents to both cores, but do not call commit until both
 report that indexing worked. If one of the cores throws an exception,
 call roll back on both cores.

 On Sat, Jun 30, 2012 at 6:50 AM, Nicholas Ball
 nicholas.b...@nodelay.com wrote:

 Hey all,

 Trying to figure out the best way to perform atomic operation across
 multiple cores on the same solr instance i.e. a multi-core
 environment.

 An example would be to move a set of docs from one core onto another
 core
 and ensure that a softcommit is done as the exact same time. If one
 were
 to
 fail so would the other.
 Obviously this would probably require some customization but wanted to
 know what the best way to tackle this would be and where should I be
 looking in the source.

 Many thanks for the help in advance,
 Nicholas a.k.a. incunix


Re: how to boost exact match

2012-08-10 Thread Li Li
create an field for exact match. it is a optional boolean clause
在 2012-8-11 下午1:42,abhayd ajdabhol...@hotmail.com写道:

 hi

 I have documents like
 iphone 4 - white
 iphone 4s - black
 ipone4 - black

 when user searches for iphone 4 i would like to show iphone 4 docs first
 and
 iphone 4s after that.
 Similary when user is searching for iphone 4s i would like to show iphone
 4s
 docs first then iphone 4 docs.

 At present i use whitespace tokenizer. Any idea how to achieve this?





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/how-to-boost-exact-match-tp4000576.html
 Sent from the Solr - User mailing list archive at Nabble.com.



filed type for text search

2012-07-24 Thread Xiao Li
I have used Solr 3.4 for a long time. Recently, when I upgrade to Solr 4.0
and reindex the whole data, I find that the fields which are specified as
string type can not be searched by q parameter. If  I just change the type
to text_general, it works.  So my question is for Solr 4.0, must I set the
field type of text_general for text search?


Search special chars

2012-07-23 Thread Li, Qiang
Hi All,

I want to search some keywords like Non-taxable, which has a - in the word. 
Can I make it working in Solr by some configuration? Or any other ways?

Thanks  Regards,
Ivan

This email message and any attachments are for the sole use of the intended 
recipients and may contain proprietary and/or confidential information which 
may be privileged or otherwise protected from disclosure. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not an 
intended recipient, please contact the sender by reply email and destroy the 
original message and any copies of the message as well as any attachments to 
the original message. Local registered entity information: 
http://www.msci.com/legal/local_registered_entities.html


Re: Solr seems to hang

2012-06-28 Thread Li Li
could you please use jstack to dump the call stacks?

On Thu, Jun 28, 2012 at 2:53 PM, Arkadi Colson ark...@smartbit.be wrote:
 It now hanging for 15 hour and nothing changes in the index directory.

 Tips for further debugging?


 On 06/27/2012 03:50 PM, Arkadi Colson wrote:

 I'm sending files to solr with the php Solr library. I'm doing a commit
 every 1000 documents:
       autoCommit
         maxDocs1000/maxDocs
 !--         maxTime1000/maxTime --
       /autoCommit

 Hard to say how long it's hanging. At least for 1 hour. After that I
 restarted Tomcat to continue... I will have a look at the indexes next time
 it's hanging. Thanks for the tip!

 SOLR: 3.6
 TOMCAT: 7.0.28
 JAVA: 1.7.0_05-b05


 On 06/27/2012 03:13 PM, Erick Erickson wrote:

 How long is it hanging? And how are you sending files to Tika, and
 especially how often do you commit? One problem that people
 run into is that they commit too often, causing segments to be
 merged and occasionally that just takes a while and people
 think that Solr is hung.

 18G isn't very large as indexes go, so it's unlikely that's your problem,
 except if merging is going on in which case you might be copying a bunch
 of data. So try seeing if you're getting a bunch of disk activity, you
 can get
 a crude idea of what's going on if you just look at the index directory
 on
 your Solr server while it's hung.

 What version of Solr are you using? Details matter

 Best
 Erick

 On Wed, Jun 27, 2012 at 7:51 AM, Arkadi Colson ark...@smartbit.be
 wrote:

 Anybody an idea?

 The thread Dump looks like this:

 Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed
 mode):

 http-8983-6 daemon prio=10 tid=0x41126000 nid=0x5c1 in
 Object.wait() [0x7fa0ad197000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on 0x00070abf4ad0 (a
 org.apache.tomcat.util.net.JIoEndpoint$Worker)
        at java.lang.Object.wait(Object.java:485)
        at

 org.apache.tomcat.util.net.JIoEndpoint$Worker.await(JIoEndpoint.java:458)
        - locked 0x00070abf4ad0 (a
 org.apache.tomcat.util.net.JIoEndpoint$Worker)
        at
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:484)
        at java.lang.Thread.run(Thread.java:662)

 pool-4-thread-1 prio=10 tid=0x7fa0a054d800 nid=0x5be waiting on
 condition [0x7f9f962f4000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  0x000702598b30 (a
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at

 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
        at java.util.concurrent.DelayQueue.take(DelayQueue.java:160)
        at

 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
        at

 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
        at

 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
        at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:662)

 http-8983-5 daemon prio=10 tid=0x412d2800 nid=0x5bd runnable
 [0x7f9f94171000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at

 org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:735)
        at

 org.apache.coyote.http11.InternalInputBuffer.parseRequestLine(InternalInputBuffer.java:366)
        at

 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:814)
        at

 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
        at
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
        at java.lang.Thread.run(Thread.java:662)

 http-8983-4 daemon prio=10 tid=0x41036000 nid=0x5b1 in
 Object.wait() [0x7f9f966c9000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on 0x00070b6e4790 (a
 org.apache.lucene.index.DocumentsWriter)
        at java.lang.Object.wait(Object.java:485)
        at

 org.apache.lucene.index.DocumentsWriter.waitIdle(DocumentsWriter.java:986)
        - locked 0x00070b6e4790 (a
 org.apache.lucene.index.DocumentsWriter)
        at
 org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:524)
        - locked 0x00070b6e4790 (a
 org.apache.lucene.index.DocumentsWriter)
        at
 org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3580)
        - locked 0x00070b6e4858 (a
 

Re: what is precisionStep and positionIncrementGap

2012-06-28 Thread Li Li
read How it works of
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/NumericRangeQuery.html
if you can read Chinese, I have a blog explaining the details of the
implementation.
http://blog.csdn.net/fancyerii/article/details/7256379

On Thu, Jun 28, 2012 at 3:51 PM, ZHANG Liang F
liang.f.zh...@alcatel-sbell.com.cn wrote:
 Thanks a lot, but the precisionStep is still very vague to me! Could you give 
 me a example?

 -Original Message-
 From: Li Li [mailto:fancye...@gmail.com]
 Sent: 2012年6月28日 11:25
 To: solr-user@lucene.apache.org
 Subject: Re: what is precisionStep and positionIncrementGap

 1. precisionStep is used for ranging query of Numeric Fields. see 
 http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/NumericRangeQuery.html
 2. positionIncrementGap is used for phrase query of multi-value fields e.g. 
 doc1 has two titles.
   title1: ab cd
   title2: xy zz
   if your positionIncrementGap is 0, then the position of the 4 terms are 
 0,1,2,3.
   if you search phrase cd xy, it will hit. But you may think it should not 
 match
   so you can adjust positionIncrementGap to a larger one. e.g. 100.
   Then the positions now are 0,1,100,101. the phrase query will not match it.

 On Thu, Jun 28, 2012 at 10:00 AM, ZHANG Liang F 
 liang.f.zh...@alcatel-sbell.com.cn wrote:
 Hi,
 in the schema.xml, usually there will be fieldType definition like
 this: fieldType name=int class=solr.TrieIntField
 precisionStep=0 omitNorms=true positionIncrementGap=0/

 the precisionStep and positionIncrementGap is not very clear to me. Could 
 you please elaborate more on these 2?

 Thanks!

 Liang


Re: Solr seems to hang

2012-06-27 Thread Li Li
seems that the indexwriter wants to flush but need to wait others become
idle. but i see you the n gram filter is working. is your field's value too
long? you sould also tell us average load the system. the free memory and
memory used by jvm
在 2012-6-27 晚上7:51,Arkadi Colson ark...@smartbit.be写道:

 Anybody an idea?

 The thread Dump looks like this:

 Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode):

 http-8983-6 daemon prio=10 tid=0x41126000 nid=0x5c1 in
 Object.wait() [0x7fa0ad197000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0x00070abf4ad0 (a org.apache.tomcat.util.net.**
 JIoEndpoint$Worker)
at java.lang.Object.wait(Object.**java:485)
at org.apache.tomcat.util.net.**JIoEndpoint$Worker.await(**
 JIoEndpoint.java:458)
- locked 0x00070abf4ad0 (a org.apache.tomcat.util.net.**
 JIoEndpoint$Worker)
at org.apache.tomcat.util.net.**JIoEndpoint$Worker.run(**
 JIoEndpoint.java:484)
at java.lang.Thread.run(Thread.**java:662)

 pool-4-thread-1 prio=10 tid=0x7fa0a054d800 nid=0x5be waiting on
 condition [0x7f9f962f4000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x000702598b30 (a
 java.util.concurrent.locks.**AbstractQueuedSynchronizer$**ConditionObject)
at java.util.concurrent.locks.**LockSupport.park(LockSupport.**
 java:158)
at java.util.concurrent.locks.**AbstractQueuedSynchronizer$**
 ConditionObject.await(**AbstractQueuedSynchronizer.**java:1987)
at java.util.concurrent.**DelayQueue.take(DelayQueue.**java:160)
at java.util.concurrent.**ScheduledThreadPoolExecutor$**
 DelayedWorkQueue.take(**ScheduledThreadPoolExecutor.**java:609)
at java.util.concurrent.**ScheduledThreadPoolExecutor$**
 DelayedWorkQueue.take(**ScheduledThreadPoolExecutor.**java:602)
at java.util.concurrent.**ThreadPoolExecutor.getTask(**
 ThreadPoolExecutor.java:947)
at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
 ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.**java:662)

 http-8983-5 daemon prio=10 tid=0x412d2800 nid=0x5bd runnable
 [0x7f9f94171000]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.**socketRead0(Native Method)
at java.net.SocketInputStream.**read(SocketInputStream.java:**129)
at org.apache.coyote.http11.**InternalInputBuffer.fill(**
 InternalInputBuffer.java:735)
at org.apache.coyote.http11.**InternalInputBuffer.**
 parseRequestLine(**InternalInputBuffer.java:366)
at org.apache.coyote.http11.**Http11Processor.process(**
 Http11Processor.java:814)
at org.apache.coyote.http11.**Http11Protocol$**
 Http11ConnectionHandler.**process(Http11Protocol.java:**602)
at org.apache.tomcat.util.net.**JIoEndpoint$Worker.run(**
 JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.**java:662)

 http-8983-4 daemon prio=10 tid=0x41036000 nid=0x5b1 in
 Object.wait() [0x7f9f966c9000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0x00070b6e4790 (a org.apache.lucene.index.**
 DocumentsWriter)
at java.lang.Object.wait(Object.**java:485)
at org.apache.lucene.index.**DocumentsWriter.waitIdle(**
 DocumentsWriter.java:986)
- locked 0x00070b6e4790 (a org.apache.lucene.index.**
 DocumentsWriter)
at org.apache.lucene.index.**DocumentsWriter.flush(**
 DocumentsWriter.java:524)
- locked 0x00070b6e4790 (a org.apache.lucene.index.**
 DocumentsWriter)
at org.apache.lucene.index.**IndexWriter.doFlush(**
 IndexWriter.java:3580)
- locked 0x00070b6e4858 (a org.apache.solr.update.**
 SolrIndexWriter)
at org.apache.lucene.index.**IndexWriter.flush(IndexWriter.**
 java:3545)
at org.apache.lucene.index.**IndexWriter.updateDocument(**
 IndexWriter.java:2328)
at org.apache.lucene.index.**IndexWriter.updateDocument(**
 IndexWriter.java:2293)
at org.apache.solr.update.**DirectUpdateHandler2.addDoc(**
 DirectUpdateHandler2.java:240)
at org.apache.solr.update.**processor.RunUpdateProcessor.**
 processAdd(**RunUpdateProcessorFactory.**java:61)
at org.apache.solr.update.**processor.LogUpdateProcessor.**
 processAdd(**LogUpdateProcessorFactory.**java:115)
at org.apache.solr.handler.**extraction.**ExtractingDocumentLoader.
 **doAdd(**ExtractingDocumentLoader.java:**141)
at org.apache.solr.handler.**extraction.**ExtractingDocumentLoader.
 **addDoc(**ExtractingDocumentLoader.java:**146)
at org.apache.solr.handler.**extraction.**
 ExtractingDocumentLoader.load(**ExtractingDocumentLoader.java:**236)
at org.apache.solr.handler.**ContentStreamHandlerBase.**
 handleRequestBody(**ContentStreamHandlerBase.java:**58)
  

Re: what is precisionStep and positionIncrementGap

2012-06-27 Thread Li Li
1. precisionStep is used for ranging query of Numeric Fields. see
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/NumericRangeQuery.html
2. positionIncrementGap is used for phrase query of multi-value fields
e.g. doc1 has two titles.
   title1: ab cd
   title2: xy zz
   if your positionIncrementGap is 0, then the position of the 4 terms
are 0,1,2,3.
   if you search phrase cd xy, it will hit. But you may think it
should not match
   so you can adjust positionIncrementGap to a larger one. e.g. 100.
   Then the positions now are 0,1,100,101. the phrase query will not match it.

On Thu, Jun 28, 2012 at 10:00 AM, ZHANG Liang F
liang.f.zh...@alcatel-sbell.com.cn wrote:
 Hi,
 in the schema.xml, usually there will be fieldType definition like this: 
 fieldType name=int class=solr.TrieIntField precisionStep=0 
 omitNorms=true positionIncrementGap=0/

 the precisionStep and positionIncrementGap is not very clear to me. Could you 
 please elaborate more on these 2?

 Thanks!

 Liang


Re: Query Logic Question

2012-06-27 Thread Li Li
I think they are logically the same. but 1 may be a little bit faster than 2

On Thu, Jun 28, 2012 at 5:59 AM, Rublex ruble...@hotmail.com wrote:
 Hi,

 Can someone explain to me please why these two queries return different
 results:

 1. -PaymentType:Finance AND -PaymentType:Lease AND -PaymentType:Cash *(700
 results)*

 2. (-PaymentType:Finance AND -PaymentType:Lease) AND -PaymentType:Cash *(0
 results)*

 Logically the two above queries should be return the same results no?

 Thank you

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Query-Logic-Question-tp3991689.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: what's better for in memory searching?

2012-06-11 Thread Li Li
I have roughly read the codes of RAMDirectory. it use a list of 1024
byte arrays and many overheads.
But as far as I know, using MMapDirectory, I can't prevent the page
faults. OS will swap less frequent pages out. Even if I allocate
enough memory for JVM, I can guarantee all the files in the directory
are in memory. am I understanding right? if it is, then some less
frequent queries will be slow.  How can I let them always in memory?

On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskog goks...@gmail.com wrote:
 Yes, use MMapDirectory. It is faster and uses memory more efficiently
 than RAMDirectory. This sounds wrong, but it is true. With
 RAMDirectory, Java has to work harder doing garbage collection.

 On Fri, Jun 8, 2012 at 1:30 AM, Li Li fancye...@gmail.com wrote:
 hi all
   I want to use lucene 3.6 providing searching service. my data is
 not very large, raw data is less that 1GB and I want to use load all
 indexes into memory. also I need save all indexes into disk
 persistently.
   I originally want to use RAMDirectory. But when I read its javadoc.

   Warning: This class is not intended to work with huge indexes.
 Everything beyond several hundred megabytes
  will waste resources (GC cycles), because it uses an internal buffer
 size of 1024 bytes, producing millions of byte
  [1024] arrays. This class is optimized for small memory-resident
 indexes. It also has bad concurrency on
  multithreaded environments.
 It is recommended to materialize large indexes on disk and use
 MMapDirectory, which is a high-performance
  directory implementation working directly on the file system cache of
 the operating system, so copying data to
  Java heap space is not useful.

    should I use MMapDirectory? it seems another contrib instantiated.
 anyone test it with RAMDirectory?



 --
 Lance Norskog
 goks...@gmail.com


Re: what's better for in memory searching?

2012-06-11 Thread Li Li
do you mean software RAM disk? using RAM to simulate disk? How to deal
with Persistence?

maybe I can hack by increase RAMOutputStream.BUFFER_SIZE from 1024 to 1024*1024.
it may have a waste. but I can adjust my merge policy to avoid to much segments.
I will have a big segment and a small segment. Every night I will
merge them. new added documents will flush into a new segment and I
will merge the new generated segment and the small one.
Our update operations are not very frequent.

On Mon, Jun 11, 2012 at 4:59 PM, Paul Libbrecht p...@hoplahup.net wrote:
 Li Li,

 have you considered allocating a RAM-Disk?
 It's not the most flexible thing... but it's certainly close, in performance 
 to a RAMDirectory.
 MMapping on that is likely to be useless but I doubt you can set it to zero.
 That'd need experiment.

 Also, doesn't caching and auto-warming provide the lowest latency for all 
 expected queries ?

 Paul


 Le 11 juin 2012 à 10:50, Li Li a écrit :

   I want to use lucene 3.6 providing searching service. my data is
 not very large, raw data is less that 1GB and I want to use load all
 indexes into memory. also I need save all indexes into disk
 persistently.
   I originally want to use RAMDirectory. But when I read its javadoc.




Re: what's better for in memory searching?

2012-06-11 Thread Li Li
I am sorry. I make a mistake. even use RAMDirectory, I can not
guarantee they are not swapped out.

On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmann k...@solarier.de wrote:
 Set the swapiness to 0 to avoid memory pages being swapped to disk too
 early.

 http://en.wikipedia.org/wiki/Swappiness

 -Kuli

 Am 11.06.2012 10:38, schrieb Li Li:

 I have roughly read the codes of RAMDirectory. it use a list of 1024
 byte arrays and many overheads.
 But as far as I know, using MMapDirectory, I can't prevent the page
 faults. OS will swap less frequent pages out. Even if I allocate
 enough memory for JVM, I can guarantee all the files in the directory
 are in memory. am I understanding right? if it is, then some less
 frequent queries will be slow.  How can I let them always in memory?

 On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskoggoks...@gmail.com  wrote:

 Yes, use MMapDirectory. It is faster and uses memory more efficiently
 than RAMDirectory. This sounds wrong, but it is true. With
 RAMDirectory, Java has to work harder doing garbage collection.

 On Fri, Jun 8, 2012 at 1:30 AM, Li Lifancye...@gmail.com  wrote:

 hi all
   I want to use lucene 3.6 providing searching service. my data is
 not very large, raw data is less that 1GB and I want to use load all
 indexes into memory. also I need save all indexes into disk
 persistently.
   I originally want to use RAMDirectory. But when I read its javadoc.

   Warning: This class is not intended to work with huge indexes.
 Everything beyond several hundred megabytes
  will waste resources (GC cycles), because it uses an internal buffer
 size of 1024 bytes, producing millions of byte
  [1024] arrays. This class is optimized for small memory-resident
 indexes. It also has bad concurrency on
  multithreaded environments.
 It is recommended to materialize large indexes on disk and use
 MMapDirectory, which is a high-performance
  directory implementation working directly on the file system cache of
 the operating system, so copying data to
  Java heap space is not useful.

    should I use MMapDirectory? it seems another contrib instantiated.
 anyone test it with RAMDirectory?




 --
 Lance Norskog
 goks...@gmail.com




Re: what's better for in memory searching?

2012-06-11 Thread Li Li
yes, I need average query time less than 10 ms. The faster the better.
I have enough memory for lucene because I know there are not too much
data. there are not many modifications. every day there are about
hundreds of document update. if indexes are not in physical memory,
then IO operations will cost a few ms.
btw, the full gc may also add uncertainty, So I need optimize it as
much as possible.
On Mon, Jun 11, 2012 at 5:27 PM, Michael Kuhlmann k...@solarier.de wrote:
 You cannot guarantee this when you're running out of RAM. You'd have a
 problem then anyway.

 Why are you caring that much? Did you yet have performance issues? 1GB
 should load really fast, and both auto warming and OS cache should help a
 lot as well. With such an index, you usually don't need to fine tune
 performance that much.

 Did you think about using a SSD? Since you want to persist your index,
 you'll need to live with disk IO anyway.

 Greetings,
 Kuli

 Am 11.06.2012 11:20, schrieb Li Li:

 I am sorry. I make a mistake. even use RAMDirectory, I can not
 guarantee they are not swapped out.

 On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmannk...@solarier.de
  wrote:

 Set the swapiness to 0 to avoid memory pages being swapped to disk too
 early.

 http://en.wikipedia.org/wiki/Swappiness

 -Kuli

 Am 11.06.2012 10:38, schrieb Li Li:

 I have roughly read the codes of RAMDirectory. it use a list of 1024
 byte arrays and many overheads.
 But as far as I know, using MMapDirectory, I can't prevent the page
 faults. OS will swap less frequent pages out. Even if I allocate
 enough memory for JVM, I can guarantee all the files in the directory
 are in memory. am I understanding right? if it is, then some less
 frequent queries will be slow.  How can I let them always in memory?

 On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskoggoks...@gmail.com
  wrote:


 Yes, use MMapDirectory. It is faster and uses memory more efficiently
 than RAMDirectory. This sounds wrong, but it is true. With
 RAMDirectory, Java has to work harder doing garbage collection.

 On Fri, Jun 8, 2012 at 1:30 AM, Li Lifancye...@gmail.com    wrote:


 hi all
   I want to use lucene 3.6 providing searching service. my data is
 not very large, raw data is less that 1GB and I want to use load all
 indexes into memory. also I need save all indexes into disk
 persistently.
   I originally want to use RAMDirectory. But when I read its javadoc.

   Warning: This class is not intended to work with huge indexes.
 Everything beyond several hundred megabytes
  will waste resources (GC cycles), because it uses an internal buffer
 size of 1024 bytes, producing millions of byte
  [1024] arrays. This class is optimized for small memory-resident
 indexes. It also has bad concurrency on
  multithreaded environments.
 It is recommended to materialize large indexes on disk and use
 MMapDirectory, which is a high-performance
  directory implementation working directly on the file system cache of
 the operating system, so copying data to
  Java heap space is not useful.

    should I use MMapDirectory? it seems another contrib instantiated.
 anyone test it with RAMDirectory?





 --
 Lance Norskog
 goks...@gmail.com






Re: what's better for in memory searching?

2012-06-11 Thread Li Li
I found this. 
http://unix.stackexchange.com/questions/10214/per-process-swapiness-for-linux
it can provide  fine grained control of swapping

On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmann k...@solarier.de wrote:
 Set the swapiness to 0 to avoid memory pages being swapped to disk too
 early.

 http://en.wikipedia.org/wiki/Swappiness

 -Kuli

 Am 11.06.2012 10:38, schrieb Li Li:

 I have roughly read the codes of RAMDirectory. it use a list of 1024
 byte arrays and many overheads.
 But as far as I know, using MMapDirectory, I can't prevent the page
 faults. OS will swap less frequent pages out. Even if I allocate
 enough memory for JVM, I can guarantee all the files in the directory
 are in memory. am I understanding right? if it is, then some less
 frequent queries will be slow.  How can I let them always in memory?

 On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskoggoks...@gmail.com  wrote:

 Yes, use MMapDirectory. It is faster and uses memory more efficiently
 than RAMDirectory. This sounds wrong, but it is true. With
 RAMDirectory, Java has to work harder doing garbage collection.

 On Fri, Jun 8, 2012 at 1:30 AM, Li Lifancye...@gmail.com  wrote:

 hi all
   I want to use lucene 3.6 providing searching service. my data is
 not very large, raw data is less that 1GB and I want to use load all
 indexes into memory. also I need save all indexes into disk
 persistently.
   I originally want to use RAMDirectory. But when I read its javadoc.

   Warning: This class is not intended to work with huge indexes.
 Everything beyond several hundred megabytes
  will waste resources (GC cycles), because it uses an internal buffer
 size of 1024 bytes, producing millions of byte
  [1024] arrays. This class is optimized for small memory-resident
 indexes. It also has bad concurrency on
  multithreaded environments.
 It is recommended to materialize large indexes on disk and use
 MMapDirectory, which is a high-performance
  directory implementation working directly on the file system cache of
 the operating system, so copying data to
  Java heap space is not useful.

    should I use MMapDirectory? it seems another contrib instantiated.
 anyone test it with RAMDirectory?




 --
 Lance Norskog
 goks...@gmail.com




Re: what's better for in memory searching?

2012-06-11 Thread Li Li
is this method equivalent to set vm.swappiness which is global?
or it can set the swappiness for jvm process?

On Tue, Jun 12, 2012 at 5:11 AM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
 Point about premature optimization makes sense for me. However some time
 ago I've bookmarked potentially useful approach
 http://lucene.472066.n3.nabble.com/High-response-time-after-being-idle-tp3616599p3617604.html.

 On Mon, Jun 11, 2012 at 3:02 PM, Toke Eskildsen 
 t...@statsbiblioteket.dkwrote:

 On Mon, 2012-06-11 at 11:38 +0200, Li Li wrote:
  yes, I need average query time less than 10 ms. The faster the better.
  I have enough memory for lucene because I know there are not too much
  data. there are not many modifications. every day there are about
  hundreds of document update. if indexes are not in physical memory,
  then IO operations will cost a few ms.

 I'm with Michael on this one: It seems that you're doing a premature
 optimization. Guessing that your final index will be  5GB in size with
 1 million documents (give or take 900.000:-), relatively simple queries
 and so on, an average response time of 10 ms should be attainable even
 on spinning drives. One hundred document updates per day are not many,
 so again I would not expect problems.

 As is often the case on this mailing list, the advice is try it. Using
 a normal on-disk index and doing some warm up is the easy solution to
 implement and nearly all of your work on this will be usable for a
 RAM-based solution, if you are not satisfied with the speed. Or you
 could buy a small  cheap SSD and have no more worries...

 Regards,
 Toke Eskildsen




 --
 Sincerely yours
 Mikhail Khludnev
 Tech Lead
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com


Re: [Announce] Solr 3.6 with RankingAlgorithm 1.4.2 - NRT support

2012-05-27 Thread Li Li
yes, I am also interested in good performance with 2 billion docs. how
many search nodes do you use? what's the average response time and qps
?

another question: where can I find related paper or resources of your
algorithm which explains the algorithm in detail? why it's better than
google site(better than lucene is not very interested because lucene
is not originally designed to provide search function like google)?

On Mon, May 28, 2012 at 1:06 AM, Darren Govoni dar...@ontrenet.com wrote:
 I think people on this list would be more interested in your approach to
 scaling 2 billion documents than modifying solr/lucene scoring (which is
 already top notch). So given that, can you share any references or
 otherwise substantiate good performance with 2 billion documents?

 Thanks.

 On Sun, 2012-05-27 at 08:29 -0700, Nagendra Nagarajayya wrote:
 Actually, RankingAlgorithm 1.4.2 has been scaled to more than 2 billion
 docs. With RankingAlgorithm 1.4.3, using the parameters
 age=latestdocs=number feature, you can retrieve the NRT inserted
 documents in milliseconds from such a huge index improving query and
 faceting performance and using very little resources ...

 Currently, RankingAlgorithm 1.4.3 is only available with Solr 4.0, and
 the NRT insert performance with Solr 4.0 is about 70,000 docs / sec.
 RankingAlgorithm 1.4.3 should become available with Solr 3.6 soon.

 Regards,

 Nagendra Nagarajayya
 http://solr-ra.tgels.org
 http://rankingalgorithm.tgels.org



 On 5/27/2012 7:32 AM, Darren Govoni wrote:
  Hi,
     Have you tested this with a billion documents?
 
  Darren
 
  On Sun, 2012-05-27 at 07:24 -0700, Nagendra Nagarajayya wrote:
  Hi!
 
  I am very excited to announce the availability of Solr 3.6 with
  RankingAlgorithm 1.4.2.
 
  This NRT supports now works with both RankingAlgorithm and Lucene. The
  insert/update performance should be about 5000 docs in about 490 ms with
  the MbArtists Index.
 
  RankingAlgorithm 1.4.2 has multiple algorithms, improved performance
  over the earlier releases, supports the entire Lucene Query Syntax, ±
  and/or boolean queries and can scale to more than a billion documents.
 
  You can get more information about NRT performance from here:
  http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_3.x
 
  You can download Solr 3.6 with RankingAlgorithm 1.4.2 from here:
  http://solr-ra.tgels.org
 
  Please download and give the new version a try.
 
  Regards,
 
  Nagendra Nagarajayya
  http://solr-ra.tgels.org
  http://rankingalgorithm.tgels.org
 
  ps. MbArtists index is the example index used in the Solr 1.4 Enterprise
  Book
 
 
 
 





Re: How can i search site name

2012-05-22 Thread Li Li
you should define your search first.
if the site is www.google.com. how do you match it. full string
matching or partial matching. e.g. is google should match? if it
does, you should write your own analyzer for this field.

On Tue, May 22, 2012 at 2:03 PM, Shameema Umer shem...@gmail.com wrote:
 Sorry,
 Please let me know how can I search site name using the solr query syntax.
 My results should show title, url and content.
 Title and content are being searched even though the
 defaultSearchFieldcontent/defaultSearchField.

 I need url or site name too. please, help.

 Thanks in advance.

 On Tue, May 22, 2012 at 11:05 AM, ketan kore ketankore...@gmail.com wrote:

 you can go on www.google.com and just type the site which you want to
 search and google will show you the results as simple as that ...



Re: Installing Solr on Tomcat using Shell - Code wrong?

2012-05-22 Thread Li Li
you should find some clues from tomcat log
在 2012-5-22 晚上7:49,Spadez james_will...@hotmail.com写道:

 Hi,

 This is the install process I used in my shell script to try and get Tomcat
 running with Solr (debian server):



 I swear this used to work, but currently only Tomcat works. The Solr page
 just comes up with The requested resource (/solr/admin) is not available.

 Can anyone give me some insight into why this isnt working? Its driving me
 nuts.

 James

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Installing-Solr-on-Tomcat-using-Shell-Code-wrong-tp3985393.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr query with mandatory values

2012-05-09 Thread Li Li
+ before term is correct. in lucene term includes field and value.

Query  ::= ( Clause )*

Clause ::= [+, -] [TERM :] ( TERM | ( Query ) )

#_TERM_CHAR: ( _TERM_START_CHAR | _ESCAPED_CHAR | - | + ) 

#_ESCAPED_CHAR: \\ ~[] 


in lucene query syntax, you can't express a term value including space.
you can use quotation mark but lucene will take it as a phrase query.
so you need escape space like title:hello\\ world
which will take hello world as a field value. and the analyzer then
will tokenize it. so you should use analyzer which can deal with
space. e.g. you can use keyword analyzer

as far as I know

On Thu, May 10, 2012 at 3:35 AM, Matt Kuiper matt.kui...@issinc.com wrote:
 Yes.

 See http://wiki.apache.org/solr/SolrQuerySyntax  - The standard Solr Query 
 Parser syntax is a superset of the Lucene Query Parser syntax.
 Which links to http://lucene.apache.org/core/3_6_0/queryparsersyntax.html

 Note - Based on the info on these pages I believe the + symbol is to be 
 placed just before the mandatory value, not before the field name in the 
 query.

 Matt Kuiper
 Intelligent Software Solutions

 -Original Message-
 From: G.Long [mailto:jde...@gmail.com]
 Sent: Wednesday, May 09, 2012 10:45 AM
 To: solr-user@lucene.apache.org
 Subject: Solr query with mandatory values

 Hi :)

 I remember that in a Lucene query, there is something like mandatory values. 
 I just have to add a + symbol in front of the mandatory parameter, like: 
 +myField:my value

 I was wondering if there was something similar in Solr queries? Or is this 
 behaviour activated by default?

 Gary




Re: SOLRJ: Is there a way to obtain a quick count of total results for a query

2012-05-04 Thread Li Li
don't score by relevance and score by document id may speed it up a little?
I haven't done any test of this. may be u can give it a try. because
scoring will consume
some cpu time. you just want to match and get total count

On Wed, May 2, 2012 at 11:58 PM, vybe3142 vybe3...@gmail.com wrote:
 I can achieve this by building a query with start and rows = 0, and using
 queryResponse.getResults().getNumFound().

 Are there any more efficient approaches to this?

 Thanks

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SOLRJ-Is-there-a-way-to-obtain-a-quick-count-of-total-results-for-a-query-tp3955322.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorting result first which come first in sentance

2012-05-03 Thread Li Li
as for version below 4.0, it's not possible because lucene's score
model. position information is stored, but only used to support phrase
query. it just tell us whether a document is matched, but we can boost
a document. The similar problem is : how to implement proximity boost.
for 2 search terms, we need return all docs that contains this 2
terms. but if they are phrase, we give it a largest boost. if there is
a word between them, we give it a smaller one. if there are 2 words
between them, we will give it smaller score. 
all this ranking algorithm need more flexible score model.
I don't know whether the latest trunk take this into consideration.

On Fri, May 4, 2012 at 3:43 AM, Jonty Rhods jonty.rh...@gmail.com wrote:
 Hi all,



 I need suggetion:



 I

 Hi all,



 I need suggetion:



 I have many title like:



 1 bomb blast in kabul

 2 kabul bomb blast

 3 3 people killed in serial bomb blast in kabul



 I want 2nd result should come first while user search by kabul.

 Because kabul is on 1st postion in that sentance.  Similarly 1st result
 should come on 2nd and 3rd should come last.



 Please suggest me hot to implement this..



 Regard

 Jonty



Re: Sorting result first which come first in sentance

2012-05-03 Thread Li Li
for this version, you may consider using payload for position boost.
you can save boost values in payload.
I have used it in lucene api where anchor text should weigh more than
normal text. but I haven't used it in solr.
some searched urls:
http://wiki.apache.org/solr/Payloads
http://digitalpebble.blogspot.com/2010/08/using-payloads-with-dismaxqparser-in.html


On Fri, May 4, 2012 at 9:51 AM, Jonty Rhods jonty.rh...@gmail.com wrote:
 I am using solr version 3.4


Re: get latest 50 documents the fastest way

2012-05-01 Thread Li Li
you should reverse your sort algorithm. maybe you can override the tf
method of Similarity and return -1.0f * tf(). (I don't know whether
default collector allow score smaller than zero)
Or you can hack this by add a large number or write your own
collector, in its collect(int doc) method, you can do like this:
collect(int doc){
float score=scorer.score();
score*=-1.0f;

}
if you don't sort by relevant score, just set Sort

On Tue, May 1, 2012 at 10:38 PM, Yuval Dotan yuvaldo...@gmail.com wrote:
 Hi Guys
 We have a use case where we need to get the 50 *latest *documents that
 match my query - without additional ranking,sorting,etc on the results.
 My index contains 1,000,000,000 documents and i noticed that if the number
 of found documents is very big (larger than 50% of the index size -
 500,000,000 docs) than it takes more than 5 seconds to get the results even
 with rows=50 parameter.
 Is there a way to get the results faster?
 Thanks
 Yuval


question about NRT(soft commit) and Transaction Log in trunk

2012-04-28 Thread Li Li
hi
   I checked out the trunk and played with its new soft commit
feature. it's cool. But I've got a few questions about it.
   By reading some introductory articles and wiki, and hasted code
reading, my understand of it's implementation is:
   For normal commit(hard commit), we should flush all into disk and
commit it. flush is not very time consuming because of
os level cache. the most time consuming one is sync in commit process.
   Soft commit just flush postings and pending deletions into disk
and generating new segments. Then solr can use a
new searcher to read the latest indexes and warm up and then register itself.
   if there is no hard commit and the jvm crashes, then new data may lose.
   if my understanding is correct, then why we need transaction log?
   I found in DirectUpdateHandler2, every time a command is executed,
TransactionLog will record a line in log. But the default
sync level in RunUpdateProcessorFactory is flush, which means it will
not sync the log file. does this make sense?
   in database implementation, we usually write log and modify data
in memory because log is smaller than real data. if crashes.
we can redo the unfinished log and make data correct. will Solr
leverage this log like this? if it is, why it's not synced?


Re: Solr Scoring

2012-04-13 Thread Li Li
another way is to use payload http://wiki.apache.org/solr/Payloads
the advantage of payload is that you only need one field and can make frq
file smaller than use two fields. but the disadvantage is payload is stored
in prx file, so I am not sure which one is fast. maybe you can try them
both.

On Fri, Apr 13, 2012 at 8:04 AM, Erick Erickson erickerick...@gmail.comwrote:

 GAH! I had my head in make this happen in one field when I wrote my
 response, without being explicit. Of course Walter's solution is pretty
 much the standard way to deal with this.

 Best
 Erick

 On Thu, Apr 12, 2012 at 5:38 PM, Walter Underwood wun...@wunderwood.org
 wrote:
  It is easy. Create two fields, text_exact and text_stem. Don't use the
 stemmer in the first chain, do use the stemmer in the second. Give the
 text_exact a bigger weight than text_stem.
 
  wunder
 
  On Apr 12, 2012, at 4:34 PM, Erick Erickson wrote:
 
  No, I don't think there's an OOB way to make this happen. It's
  a recurring theme, make exact matches score higher than
  stemmed matches.
 
  Best
  Erick
 
  On Thu, Apr 12, 2012 at 5:18 AM, Kissue Kissue kissue...@gmail.com
 wrote:
  Hi,
 
  I have a field in my index called itemDesc which i am applying
  EnglishMinimalStemFilterFactory to. So if i index a value to this field
  containing Edges, the EnglishMinimalStemFilterFactory applies
 stemming
  and Edges becomes Edge. Now when i search for Edges, documents
 with
  Edge score better than documents with the actual search word -
 Edges.
  Is there a way i can make documents with the actual search word in this
  case Edges score better than document with Edge?
 
  I am using Solr 3.5. My field definition is shown below:
 
  fieldType name=text_en class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory
  synonyms=index_synonyms.txt ignoreCase=true expand=false/
  filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords_en.txt
 enablePositionIncrements=true
  filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPossessiveFilterFactory/
 filter class=solr.EnglishMinimalStemFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt
  ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords_en.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPossessiveFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
  protected=protwords.txt/
 filter class=solr.EnglishMinimalStemFilterFactory/
   /analyzer
 /fieldType
 
  Thanks.
 
 
 
 
 



Re: How to read SOLR cache statistics?

2012-04-13 Thread Li Li
http://wiki.apache.org/solr/SolrCaching

On Fri, Apr 13, 2012 at 2:30 PM, Kashif Khan uplink2...@gmail.com wrote:

 Does anyone explain what does the following parameters mean in SOLR cache
 statistics?

 *name*:  queryResultCache
 *class*:  org.apache.solr.search.LRUCache
 *version*:  1.0
 *description*:  LRU Cache(maxSize=512, initialSize=512)
 *stats*:  lookups : 98
 *hits *: 59
 *hitratio *: 0.60
 *inserts *: 41
 *evictions *: 0
 *size *: 41
 *warmupTime *: 0
 *cumulative_lookups *: 98
 *cumulative_hits *: 59
 *cumulative_hitratio *: 0.60
 *cumulative_inserts *: 39
 *cumulative_evictions *: 0

 AND also this


 *name*:  fieldValueCache
 *class*:  org.apache.solr.search.FastLRUCache
 *version*:  1.0
 *description*:  Concurrent LRU Cache(maxSize=1, initialSize=10,
 minSize=9000, acceptableSize=9500, cleanupThread=false)
 *stats*:  *lookups *: 8
 *hits *: 4
 *hitratio *: 0.50
 *inserts *: 2
 *evictions *: 0
 *size *: 2
 *warmupTime *: 0
 *cumulative_lookups *: 8
 *cumulative_hits *: 4
 *cumulative_hitratio *: 0.50
 *cumulative_inserts *: 2
 *cumulative_evictions *: 0
 *item_ABC *:

 {field=ABC,memSize=340592,tindexSize=1192,time=1360,phase1=1344,nTerms=7373,bigTerms=1,termInstances=11513,uses=4}
 *item_BCD *:

 {field=BCD,memSize=341248,tindexSize=1952,time=1688,phase1=1688,nTerms=8075,bigTerms=0,termInstances=13510,uses=2}

 Without understanding these terms i cannot configure server for better
 cache
 usage. The point is searches are very slow. These stats were taken when
 server was down and restarted. I just want to understand what these terms
 mean actually


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-read-SOLR-cache-statistics-tp3907294p3907294.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: using solr to do a 'match'

2012-04-11 Thread Li Li
it's not possible now because lucene don't support this.
when doing disjunction query, it only record how many terms match this
document.
I think this is a common requirement for many users.
I suggest lucene should divide scorer to a matcher and a scorer.
the matcher just return which doc is matched and why/how the doc is matched.
especially for disjuction query, it should tell which term matches and
possible other
information such as tf/idf and the distance of terms(to support proximity
search).
That's the matcher's job. and then the scorer(a ranking algorithm) use
flexible algorithm
to score this document and the collector can collect it.

On Wed, Apr 11, 2012 at 10:28 AM, Chris Book chrisb...@gmail.com wrote:

 Hello, I have a solr index running that is working very well as a search.
  But I want to add the ability (if possible) to use it to do matching.  The
 problem is that by default it is only looking for all the input terms to be
 present, and it doesn't give me any indication as to how many terms in the
 target field were not specified by the input.

 For example, if I'm trying to match to the song title dust in the wind,
 I'm correctly getting a match if the input query is dust in wind.  But I
 don't want to get a match if the input is just dust.  Although as a
 search dust should return this result, I'm looking for some way to filter
 this out based on some indication that the input isn't close enough to the
 output.  Perhaps if I could get information that that the number of input
 terms is much less than the number of terms in the field.  Or something
 else along those line?

 I realize that this isn't the typical use case for a search, but I'm just
 looking for some suggestions as to how I could improve the above example a
 bit.

 Thanks,
 Chris



Re: using solr to do a 'match'

2012-04-11 Thread Li Li
I searched my mail but nothing found.
the thread searched by key words boolean expression is Indexing Boolean
Expressions from joaquin.delgado
to tell which terms are matched, for BooleanScorer2, a simple method is to
modify DisjunctionSumScorer and add a BitSet to record matched scorers.
When collector collect this document, it can get the scorer and recursively
find the matched terms.
But I think maybe it's better to add a component maybe named matcher that
do the matching job, and scorer use the information from the matcher and do
ranking things.

On Wed, Apr 11, 2012 at 4:32 PM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 Hi,

 This use case is similar to matching boolean expression problem. You can
 find recent thread about it. I have an idea that we can introduce
 disjunction query with dynamic mm (minShouldMatch parameter

 http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/BooleanQuery.html#setMinimumNumberShouldMatch(int)
 )
 i.e. 'match these clauses disjunctively but for every document use
 value
 from field cache of field xxxCount as a minShouldMatch parameter'. Also
 norms can be used as a source for dynamics mm values.

 Wdyt?

 On Wed, Apr 11, 2012 at 10:08 AM, Li Li fancye...@gmail.com wrote:

  it's not possible now because lucene don't support this.
  when doing disjunction query, it only record how many terms match this
  document.
  I think this is a common requirement for many users.
  I suggest lucene should divide scorer to a matcher and a scorer.
  the matcher just return which doc is matched and why/how the doc is
  matched.
  especially for disjuction query, it should tell which term matches and
  possible other
  information such as tf/idf and the distance of terms(to support proximity
  search).
  That's the matcher's job. and then the scorer(a ranking algorithm) use
  flexible algorithm
  to score this document and the collector can collect it.
 
  On Wed, Apr 11, 2012 at 10:28 AM, Chris Book chrisb...@gmail.com
 wrote:
 
   Hello, I have a solr index running that is working very well as a
 search.
But I want to add the ability (if possible) to use it to do matching.
   The
   problem is that by default it is only looking for all the input terms
 to
  be
   present, and it doesn't give me any indication as to how many terms in
  the
   target field were not specified by the input.
  
   For example, if I'm trying to match to the song title dust in the
 wind,
   I'm correctly getting a match if the input query is dust in wind.
  But
  I
   don't want to get a match if the input is just dust.  Although as a
   search dust should return this result, I'm looking for some way to
  filter
   this out based on some indication that the input isn't close enough to
  the
   output.  Perhaps if I could get information that that the number of
 input
   terms is much less than the number of terms in the field.  Or something
   else along those line?
  
   I realize that this isn't the typical use case for a search, but I'm
 just
   looking for some suggestions as to how I could improve the above
 example
  a
   bit.
  
   Thanks,
   Chris
  
 



 --
 Sincerely yours
 Mikhail Khludnev
 ge...@yandex.ru

 http://www.griddynamics.com
  mkhlud...@griddynamics.com



Re: pagerank??

2012-04-04 Thread Bing Li
According to my knowledge, Solr cannot support this.

In my case, I get data by keyword-matching from Solr and then rank the data
by PageRank after that.

Thanks,
Bing

On Wed, Apr 4, 2012 at 6:37 AM, Manuel Antonio Novoa Proenza 
mano...@estudiantes.uci.cu wrote:

 Hello,

 I have in my Solr index , many indexed documents.

 Let me know any way or efficient function to calculate the page rank of
 websites indexed.


 s

 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
 INFORMATICAS...
 CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

 http://www.uci.cu
 http://www.facebook.com/universidad.uci
 http://www.flickr.com/photos/universidad_uci


Re: Trouble Setting Up Development Environment

2012-03-24 Thread Li Li
. Runtime ClassNotFoundExceptions may result.
  solr3_5P/solr3_5Classpath Dependency Validator Message
 Classpath entry /solr3_5/ssrc/solr/lib/guava-r05.jar will not be exported
 or published. Runtime ClassNotFoundExceptions may result.  solr3_5
  P/solr3_5Classpath Dependency Validator Message
 Classpath entry /solr3_5/ssrc/solr/lib/jcl-over-slf4j-1.6.1.jar will not
 be exported or published. Runtime ClassNotFoundExceptions may result.
  solr3_5P/solr3_5Classpath Dependency Validator Message
 Classpath entry /solr3_5/ssrc/solr/lib/junit-4.7.jar will not be exported
 or published. Runtime ClassNotFoundExceptions may result.  solr3_5
  P/solr3_5Classpath Dependency Validator Message
 Classpath entry /solr3_5/ssrc/solr/lib/servlet-api-2.4.jar will not be
 exported or published. Runtime ClassNotFoundExceptions may result.
  solr3_5P/solr3_5Classpath Dependency Validator Message
 Classpath entry /solr3_5/ssrc/solr/lib/slf4j-api-1.6.1.jar will not be
 exported or published. Runtime ClassNotFoundExceptions may result.
  solr3_5P/solr3_5Classpath Dependency Validator Message
 Classpath entry /solr3_5/ssrc/solr/lib/slf4j-jdk14-1.6.1.jar will not be
 exported or published. Runtime ClassNotFoundExceptions may result.
  solr3_5P/solr3_5Classpath Dependency Validator Message
 Classpath entry /solr3_5/ssrc/solr/lib/wstx-asl-3.2.7.jar will not be
 exported or published. Runtime ClassNotFoundExceptions may result.
  solr3_5P/solr3_5Classpath Dependency Validator Message



 On Fri, Mar 23, 2012 at 3:25 AM, Li Li fancye...@gmail.com wrote:

 here is my method.
 1. check out latest source codes from trunk or download tar ball
svn checkout
 http://svn.apache.org/repos/asf/lucene/dev/trunklucene_trunk

 2. create a dynamic web project in eclipse and close it.
   for example, I create a project name lucene-solr-trunk in my
 workspace.

 3. copy/mv the source code to this project(it's not necessary)
   here is my directory structure
   lili@lili-desktop:~/workspace/lucene-solr-trunk$ ls
 bin.tests-framework  build  lucene_trunk  src  testindex  WebContent
  lucene_trunk is the top directory checked out from svn in step 1.
 4. remove WebContent generated by eclipse and modify it to a soft link to
  lili@lili-desktop:~/workspace/lucene-solr-trunk$ ll WebContent
 lrwxrwxrwx 1 lili lili 28 2011-08-18 18:50 WebContent -
 lucene_trunk/solr/webapp/web/
 5. open lucene_trunk/dev-tools/eclipse/dot.classpath. copy all lines like
 kind=src to a temp file
 classpathentry kind=src path=lucene/core/src/java/
 classpathentry kind=src path=lucene/core/src/resources/
 
 6. replace all string like path=xxx to path=lucene_trunk/xxx and copy
 them into .classpath file
 7. mkdir WebContent/WEB-INF/lib
 8. extract all jar file in dot.classpath to WebContent/WEB-INF/lib
I use this command:
lili@lili-desktop:~/workspace/lucene-solr-trunk/lucene_trunk$ cat
 dev-tools/eclipse/dot.classpath |grep kind=\lib|awk -F path=\
 '{print
 $2}' |awk -F \/ '{print $1}' |xargs cp ../WebContent/WEB-INF/lib/
 9. open this project and refresh it.
if everything is ok, it will compile all java files successfully. if
 there is something wrong, Probably we don't use the correct jar. because
 there are many versions of the same library.
 10. right click the project - debug As - debug on Server
it will fail because no solr home is specified.
 11. right click the project - debug As - debug Configuration -
 Arguments
 Tab - VM arguments
 add

 -Dsolr.solr.home=/home/lili/workspace/lucene-solr-trunk/lucene_trunk/solr/example/solr
 you can also add other vm arguments like -Xmx1g here.
 12. all fine, add a break point at SolrDispatchFilter.doFilter(). all solr
 request comes here
 13. have fun~


 On Fri, Mar 23, 2012 at 11:49 AM, Karthick Duraisamy Soundararaj 
 karthick.soundara...@gmail.com wrote:

  Hi Solr Ppl,
 I have been trying to set up solr dev env. I downloaded
 the
  tar ball of eclipse and the solr 3.5 source. Here are the exact
 sequence of
  steps I followed
 
  I extracted the solr 3.5 source and eclipse.
  I installed run-jetty-run plugin for eclipse.
  I ran ant eclipse in the solr 3.5 source directory
  I used eclipse's Open existing project option to open up the files in
  solr 3.5 directory. I got a huge tree in the name of lucene_solr.
 
  I run it and there is a SEVERE error: System property not set
 excetption. *
  solr*.test.sys.*prop1* not set and then the jetty loads solr. I then try
  localhost:8080/solr/select/ I get null pointer execpiton. I am only
 able to
  access admin page.
 
  Is there anything else I need to do?
 
  I tried to follow
 
 
 http://www.lucidimagination.com/devzone/technical-articles/setting-apache-solr-eclipse
  .
  But I dont find the solr-3.5.war file. I tried ant dist to generate the
  dist folder but that has many jars and wars..
 
  I am able to compile the source

Re: Trouble Setting Up Development Environment

2012-03-23 Thread Li Li
here is my method.
1. check out latest source codes from trunk or download tar ball
svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunklucene_trunk

2. create a dynamic web project in eclipse and close it.
   for example, I create a project name lucene-solr-trunk in my
workspace.

3. copy/mv the source code to this project(it's not necessary)
   here is my directory structure
   lili@lili-desktop:~/workspace/lucene-solr-trunk$ ls
bin.tests-framework  build  lucene_trunk  src  testindex  WebContent
  lucene_trunk is the top directory checked out from svn in step 1.
4. remove WebContent generated by eclipse and modify it to a soft link to
  lili@lili-desktop:~/workspace/lucene-solr-trunk$ ll WebContent
lrwxrwxrwx 1 lili lili 28 2011-08-18 18:50 WebContent -
lucene_trunk/solr/webapp/web/
5. open lucene_trunk/dev-tools/eclipse/dot.classpath. copy all lines like
kind=src to a temp file
classpathentry kind=src path=lucene/core/src/java/
classpathentry kind=src path=lucene/core/src/resources/

6. replace all string like path=xxx to path=lucene_trunk/xxx and copy
them into .classpath file
7. mkdir WebContent/WEB-INF/lib
8. extract all jar file in dot.classpath to WebContent/WEB-INF/lib
I use this command:
lili@lili-desktop:~/workspace/lucene-solr-trunk/lucene_trunk$ cat
dev-tools/eclipse/dot.classpath |grep kind=\lib|awk -F path=\ '{print
$2}' |awk -F \/ '{print $1}' |xargs cp ../WebContent/WEB-INF/lib/
9. open this project and refresh it.
if everything is ok, it will compile all java files successfully. if
there is something wrong, Probably we don't use the correct jar. because
there are many versions of the same library.
10. right click the project - debug As - debug on Server
it will fail because no solr home is specified.
11. right click the project - debug As - debug Configuration - Arguments
Tab - VM arguments
 add
-Dsolr.solr.home=/home/lili/workspace/lucene-solr-trunk/lucene_trunk/solr/example/solr
 you can also add other vm arguments like -Xmx1g here.
12. all fine, add a break point at SolrDispatchFilter.doFilter(). all solr
request comes here
13. have fun~


On Fri, Mar 23, 2012 at 11:49 AM, Karthick Duraisamy Soundararaj 
karthick.soundara...@gmail.com wrote:

 Hi Solr Ppl,
I have been trying to set up solr dev env. I downloaded the
 tar ball of eclipse and the solr 3.5 source. Here are the exact sequence of
 steps I followed

 I extracted the solr 3.5 source and eclipse.
 I installed run-jetty-run plugin for eclipse.
 I ran ant eclipse in the solr 3.5 source directory
 I used eclipse's Open existing project option to open up the files in
 solr 3.5 directory. I got a huge tree in the name of lucene_solr.

 I run it and there is a SEVERE error: System property not set excetption. *
 solr*.test.sys.*prop1* not set and then the jetty loads solr. I then try
 localhost:8080/solr/select/ I get null pointer execpiton. I am only able to
 access admin page.

 Is there anything else I need to do?

 I tried to follow

 http://www.lucidimagination.com/devzone/technical-articles/setting-apache-solr-eclipse
 .
 But I dont find the solr-3.5.war file. I tried ant dist to generate the
 dist folder but that has many jars and wars..

 I am able to compile the source with ant compile, get the solr in example
 directory up and running.

 Will be great if someone can help me with this.

 Thanks,
 Karthick



Re: How to avoid the unexpected character error?

2012-03-16 Thread Li Li
it's not the right place.
when you use java -Durl=http://... -jar post.jar data.xml
the data.xml file must be a valid xml file. you shoud escape special chars
in this file.
I don't know how you generate this file.
if you use java program(or other scripts) to generate this file, you should
use xml tools to generate this file.
but if you generate like this:
StringBuilder buf=new StringBuilder();
buf.append(add);
buf.append(doc);
buf.append(field name=fnametext content/field);
you should escape special chars.
if you use java, you can make use of org.apache.solr.common.util.XML class

On Fri, Mar 16, 2012 at 2:03 PM, neosky neosk...@yahoo.com wrote:

 I am sorry, but I can't get what you mean.
 I tried the  HTMLStripCharFilter and PatternReplaceCharFilter. It doesn't
 work.
 Could you give me an example? Thanks!

  fieldType name=text_html class=solr.TextField
 positionIncrementGap=100
   analyzer
 charFilter class=solr.HTMLStripCharFilterFactory/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
  /fieldType

 I also tried:

 charFilter class=solr.PatternReplaceCharFilterFactory pattern=([^a-z])
 replacement=
 maxBlockChars=1 blockDelimiters=|/

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-avoid-the-unexpected-character-error-tp3824726p3831064.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr out of memory exception

2012-03-15 Thread Li Li
it seems you are using 64bit jvm(32bit jvm can only allocate about 1.5GB).
you should enable pointer compression by -XX:+UseCompressedOops

On Thu, Mar 15, 2012 at 1:58 PM, Husain, Yavar yhus...@firstam.com wrote:

 Thanks for helping me out.

 I have allocated Xms-2.0GB Xmx-2.0GB

 However i see Tomcat is still using pretty less memory and not 2.0G

 Total Memory on my Windows Machine = 4GB.

 With smaller index size it is working perfectly fine. I was thinking of
 increasing the system RAM  tomcat heap space allocated but then how come
 on a different server with exactly same system and solr configuration 
 memory it is working fine?


 -Original Message-
 From: Li Li [mailto:fancye...@gmail.com]
 Sent: Thursday, March 15, 2012 11:11 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr out of memory exception

 how many memory are allocated to JVM?

 On Thu, Mar 15, 2012 at 1:27 PM, Husain, Yavar yhus...@firstam.com
 wrote:

  Solr is giving out of memory exception. Full Indexing was completed fine.
  Later while searching maybe when it tries to load the results in memory
 it
  starts giving this exception. Though with the same memory allocated to
  Tomcat and exactly same solr replica on another server it is working
  perfectly fine. I am working on 64 bit software's including Java  Tomcat
  on Windows.
  Any help would be appreciated.
 
  Here are the logs:
 
  The server encountered an internal error (Severe errors in solr
  configuration. Check your log files for more detailed information on what
  may be wrong. If you want solr to continue after configuration errors,
  change: abortOnConfigurationErrorfalse/abortOnConfigurationError in
  null -
  java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
 at
  org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at
  org.apache.solr.core.SolrCore.init(SolrCore.java:579) at
 
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
  at
 
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
  at
 
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
  at
 
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115)
  at
 
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
  at
  org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
  at
 
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
  at
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
  at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
 at
  org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:943) at
  org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:778) at
  org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:504) at
  org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317) at
 
 org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
  at
 
 org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
  at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065)
 at
  org.apache.catalina.core.StandardHost.start(StandardHost.java:840) at
  org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057) at
  org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) at
  org.apache.catalina.core.StandardService.start(StandardService.java:525)
 at
  org.apache.catalina.core.StandardServer.start(StandardServer.java:754) at
  org.apache.catalina.startup.Catalina.start(Catalina.java:595) at
  sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
  sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at
  sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at
  java.lang.reflect.Method.invoke(Unknown Source) at
  org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at
  org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) Caused by:
  java.lang.OutOfMemoryError: Java heap space at
 
 org.apache.lucene.index.SegmentTermEnum.termInfo(SegmentTermEnum.java:180)
  at
 org.apache.lucene.index.TermInfosReader.init(TermInfosReader.java:91)
  at
 
 org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:122)
  at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:652) at
  org.apache.lucene.index.SegmentReader.get(SegmentReader.java:613) at
  org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:104)
 at
 
 org.apache.lucene.index.ReadOnlyDirectoryReader.init(ReadOnlyDirectoryReader.java:27)
  at
  org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:74)
  at
 
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java

Re: Solr out of memory exception

2012-03-15 Thread Li Li
it can reduce memory usage. for small heap application less than 4GB, it
may speed up.
but be careful, for large heap application, it depends.
you should do some test for yourself.
our application's test result is: it reduce memory usage but enlarge
response time. we use 25GB memory.

http://lists.apple.com/archives/java-dev/2010/Apr/msg00157.html

Dyer, James james.d...@ingrambook.com
viahttp://support.google.com/mail/bin/answer.py?hl=enctx=mailanswer=1311182
 lucene.apache.org
3/18/11

to solr-user
Our tests showed, in our situation, the compressed oops flag caused our
minor (ParNew) generation time to decrease significantly.   We're using a
larger heap (22gb) and our index size is somewhere in the 40's gb total.  I
guess with any of these jvm parameters, it all depends on your situation
and you need to test.  In our case, this flag solved a real problem we were
having.  Whoever wrote the JRocket book you refer to no doubt had other
scenarios in mind...

On Thu, Mar 15, 2012 at 3:02 PM, C.Yunqin 345804...@qq.com wrote:

 why should enable pointer compression?




 -- Original --
 From:  Li Lifancye...@gmail.com;
 Date:  Thu, Mar 15, 2012 02:41 PM
 To:  Husain, Yavaryhus...@firstam.com;
 Cc:  solr-user@lucene.apache.orgsolr-user@lucene.apache.org;
 Subject:  Re: Solr out of memory exception


 it seems you are using 64bit jvm(32bit jvm can only allocate about 1.5GB).
 you should enable pointer compression by -XX:+UseCompressedOops

 On Thu, Mar 15, 2012 at 1:58 PM, Husain, Yavar yhus...@firstam.com
 wrote:

  Thanks for helping me out.
 
  I have allocated Xms-2.0GB Xmx-2.0GB
 
  However i see Tomcat is still using pretty less memory and not 2.0G
 
  Total Memory on my Windows Machine = 4GB.
 
  With smaller index size it is working perfectly fine. I was thinking of
  increasing the system RAM  tomcat heap space allocated but then how come
  on a different server with exactly same system and solr configuration 
  memory it is working fine?
 
 
  -Original Message-
  From: Li Li [mailto:fancye...@gmail.com]
  Sent: Thursday, March 15, 2012 11:11 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Solr out of memory exception
 
  how many memory are allocated to JVM?
 
  On Thu, Mar 15, 2012 at 1:27 PM, Husain, Yavar yhus...@firstam.com
  wrote:
 
   Solr is giving out of memory exception. Full Indexing was completed
 fine.
   Later while searching maybe when it tries to load the results in memory
  it
   starts giving this exception. Though with the same memory allocated to
   Tomcat and exactly same solr replica on another server it is working
   perfectly fine. I am working on 64 bit software's including Java 
 Tomcat
   on Windows.
   Any help would be appreciated.
  
   Here are the logs:
  
   The server encountered an internal error (Severe errors in solr
   configuration. Check your log files for more detailed information on
 what
   may be wrong. If you want solr to continue after configuration errors,
   change: abortOnConfigurationErrorfalse/abortOnConfigurationError in
   null -
   java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
  at
   org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at
   org.apache.solr.core.SolrCore.init(SolrCore.java:579) at
  
 
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
   at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
   at
  
 
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
   at
  
 
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
   at
  
 
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115)
   at
  
 
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
   at
  
 org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
   at
  
 
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
   at
  org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
   at
 org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
  at
   org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:943)
 at
   org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:778)
 at
   org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:504)
 at
   org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317) at
  
 
 org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
   at
  
 
 org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
   at
 org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065)
  at
   org.apache.catalina.core.StandardHost.start(StandardHost.java:840) at
   org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057

Re: Sorting on non-stored field

2012-03-14 Thread Li Li
it should be indexed by not analyzed. it don't need stored.
reading field values from stored fields is extremely slow.
So lucene will use StringIndex to read fields for sort. so if you want to
sort by some field, you should index this field and don't analyze it.

On Wed, Mar 14, 2012 at 6:43 PM, Finotti Simone tech...@yoox.com wrote:

 I was wondering: is it possible to sort a Solr result-set on a non-stored
 value?

 Thank you


Re: How to avoid the unexpected character error?

2012-03-14 Thread Li Li
There is a class org.apache.solr.common.util.XML in solr
you can use this wrapper:
public static String escapeXml(String s) throws IOException{
StringWriter sw=new StringWriter();
XML.escapeCharData(s, sw);
return sw.getBuffer().toString();
}

On Wed, Mar 14, 2012 at 4:34 PM, neosky neosk...@yahoo.com wrote:

 I use the xml to index the data. One filed might contains some characters
 like '' =
 It seems that will produce the error
 I modify that filed doesn't index, but it doesn't work. I need to store the
 filed, but index might not be indexed.
 Thanks!

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-avoid-the-unexpected-character-error-tp3824726p3824726.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to avoid the unexpected character error?

2012-03-14 Thread Li Li
no, it's nothing to do with schema.xml
post.jar just post a file, it don't parse this file.
solr will use xml parser to parse this file. if you don't escape special
characters, it's not a valid xml file and solr will throw exceptions.

On Thu, Mar 15, 2012 at 12:33 AM, neosky neosk...@yahoo.com wrote:

 Thanks!
 Does the schema.xml support this parameter? I am using the example post.jar
 to index my file.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-avoid-the-unexpected-character-error-tp3824726p3825959.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr out of memory exception

2012-03-14 Thread Li Li
how many memory are allocated to JVM?

On Thu, Mar 15, 2012 at 1:27 PM, Husain, Yavar yhus...@firstam.com wrote:

 Solr is giving out of memory exception. Full Indexing was completed fine.
 Later while searching maybe when it tries to load the results in memory it
 starts giving this exception. Though with the same memory allocated to
 Tomcat and exactly same solr replica on another server it is working
 perfectly fine. I am working on 64 bit software's including Java  Tomcat
 on Windows.
 Any help would be appreciated.

 Here are the logs:

 The server encountered an internal error (Severe errors in solr
 configuration. Check your log files for more detailed information on what
 may be wrong. If you want solr to continue after configuration errors,
 change: abortOnConfigurationErrorfalse/abortOnConfigurationError in
 null -
 java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at
 org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at
 org.apache.solr.core.SolrCore.init(SolrCore.java:579) at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
 at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
 at
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
 at
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
 at
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115)
 at
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
 at
 org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
 at
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
 at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
 at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601) at
 org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:943) at
 org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:778) at
 org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:504) at
 org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317) at
 org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
 at
 org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
 at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065) at
 org.apache.catalina.core.StandardHost.start(StandardHost.java:840) at
 org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057) at
 org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) at
 org.apache.catalina.core.StandardService.start(StandardService.java:525) at
 org.apache.catalina.core.StandardServer.start(StandardServer.java:754) at
 org.apache.catalina.startup.Catalina.start(Catalina.java:595) at
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
 sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at
 java.lang.reflect.Method.invoke(Unknown Source) at
 org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at
 org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) Caused by:
 java.lang.OutOfMemoryError: Java heap space at
 org.apache.lucene.index.SegmentTermEnum.termInfo(SegmentTermEnum.java:180)
 at org.apache.lucene.index.TermInfosReader.init(TermInfosReader.java:91)
 at
 org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:122)
 at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:652) at
 org.apache.lucene.index.SegmentReader.get(SegmentReader.java:613) at
 org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:104) at
 org.apache.lucene.index.ReadOnlyDirectoryReader.init(ReadOnlyDirectoryReader.java:27)
 at
 org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:74)
 at
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683)
 at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69) at
 org.apache.lucene.index.IndexReader.open(IndexReader.java:476) at
 org.apache.lucene.index.IndexReader.open(IndexReader.java:403) at
 org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38)
 at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1057) at
 org.apache.solr.core.SolrCore.init(SolrCore.java:579) at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
 at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
 at
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
 at
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
 at
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115)
 at
 

Re: index size with replication

2012-03-13 Thread Li Li
 optimize will generate new segments and delete old ones. if your master
also provides searching service during indexing, the old files may be
opened by old SolrIndexSearcher. they will be deleted later. So when
indexing, the index size may double. But a moment later, old indexes will
be deleted.

On Wed, Mar 14, 2012 at 7:06 AM, Mike Austin mike.aus...@juggle.com wrote:

 I have a master with two slaves.  For some reason on the master if I do an
 optimize after indexing on the master it double in size from 42meg to 90
 meg.. however,  when the slaves replicate they get the 42meg index..

 Should the master and slaves always be the same size?

 Thanks,
 Mike



Re: How to limit the number of open searchers?

2012-03-06 Thread Li Li
what do u mean programmatically? modify codes of solr? becuase solr is
not like lucene, it only provide http interfaces for its users other than
java api

if you want to modify solr, you can find codes in SolrCore
private final LinkedListRefCountedSolrIndexSearcher _searchers = new
LinkedListRefCountedSolrIndexSearcher();
and _searcher is current searcher.
be careful to use searcherLock to synchronizing your codes.
maybe you can write your codes like:

synchronized(searcherLock){
if(_searchers.size==1){
...
}
}




On Tue, Mar 6, 2012 at 3:18 AM, Michael Ryan mr...@moreover.com wrote:

 Is there a way to limit the number of searchers that can be open at a
 given time?  I know there is a maxWarmingSearchers configuration that
 limits the number of warming searchers, but that's not quite what I'm
 looking for...

 Ideally, when I commit, I want there to only be one searcher open before
 the commit, so that during the commit and warming, there is a max of two
 searchers open.  I'd be okay with delaying the commit until there is only
 one searcher open.  Is there a way to programmatically determine how many
 searchers are currently open?

 -Michael



Re: Fw:how to make fdx file

2012-03-04 Thread Li Li
lucene will never modify old segment files, it just flushes into a new
segment or merges old segments into new one. after merging, old segments
will be deleted.
once a file(such as fdt and fdx) is generated. it will never be
re-generated. the only possible is that in the generating stage, there is
something wrong. or it's deleted by other programs such as wrongly deleted
by human.

On Sat, Mar 3, 2012 at 2:33 PM, C.Yunqin 345804...@qq.com wrote:

 yes,the fdt file still is there.  can i make new fdx file through fdt file.
  is there a posibilty that  during the process of updating and optimizing,
 the index will be deleted then re-generated?



  -- Original --
  From:  Erick Ericksonerickerick...@gmail.com;
  Date:  Sat, Mar 3, 2012 08:28 AM
  To:  solr-usersolr-user@lucene.apache.org;

  Subject:  Re: Fw:how to make fdx file


 As far as I know, fdx files don't just disappear, so I can only assume
 that something external removed it.

 That said, if you somehow re-indexed and had no fields where
 stored=true, then the fdx file may not be there.

 Are you seeing problems as a result? This file is used to store
 index information for stored fields. Do you have an fdt file?

 Best
 Erick

 On Fri, Mar 2, 2012 at 2:48 AM, C.Yunqin 345804...@qq.com wrote:
  Hi ,
my fdx file was unexpected gone, then the solr sever stop running;
 what I can do to recover solr?
 
   Other files still exist.
 
   Thanks very much
 
 
  /:includetail



Re: Solr HBase - Re: How is Data Indexed in HBase?

2012-02-23 Thread Bing Li
Dear Mr Gupta,

Your understanding about my solution is correct. Now both HBase and Solr
are used in my system. I hope it could work.

Thanks so much for your reply!

Best regards,
Bing

On Fri, Feb 24, 2012 at 3:30 AM, T Vinod Gupta tvi...@readypulse.comwrote:

 regarding your question on hbase support for high performance and
 consistency - i would say hbase is highly scalable and performant. how it
 does what it does can be understood by reading relevant chapters around
 architecture and design in the hbase book.

 with regards to ranking, i see your problem. but if you split the problem
 into hbase specific solution and solr based solution, you can achieve the
 results probably. may be you do the ranking and store the rank in hbase and
 then use solr to get the results and then use hbase as a lookup to get the
 rank. or you can put the rank as part of the document schema and index the
 rank too for range queries and such. is my understanding of your scenario
 wrong?

 thanks


 On Wed, Feb 22, 2012 at 9:51 AM, Bing Li lbl...@gmail.com wrote:

 Mr Gupta,

 Thanks so much for your reply!

 In my use cases, retrieving data by keyword is one of them. I think Solr
 is a proper choice.

 However, Solr does not provide a complex enough support to rank. And,
 frequent updating is also not suitable in Solr. So it is difficult to
 retrieve data randomly based on the values other than keyword frequency in
 text. In this case, I attempt to use HBase.

 But I don't know how HBase support high performance when it needs to keep
 consistency in a large scale distributed system.

 Now both of them are used in my system.

 I will check out ElasticSearch.

 Best regards,
 Bing


 On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta tvi...@readypulse.comwrote:

 Bing,
 Its a classic battle on whether to use solr or hbase or a combination of
 both. both systems are very different but there is some overlap in the
 utility. they also differ vastly when it compares to computation power,
 storage needs, etc. so in the end, it all boils down to your use case. you
 need to pick the technology that it best suited to your needs.
 im still not clear on your use case though.

 btw, if you haven't started using solr yet - then you might want to
 checkout ElasticSearch. I spent over a week researching between solr and ES
 and eventually chose ES due to its cool merits.

 thanks


 On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu yuzhih...@gmail.com wrote:

 There is no secondary index support in HBase at the moment.

 It's on our road map.

 FYI

 On Wed, Feb 22, 2012 at 9:28 AM, Bing Li lbl...@gmail.com wrote:

  Jacques,
 
  Yes. But I still have questions about that.
 
  In my system, when users search with a keyword arbitrarily, the query
 is
  forwarded to Solr. No any updating operations but appending new
 indexes
  exist in Solr managed data.
 
  When I need to retrieve data based on ranking values, HBase is used.
 And,
  the ranking values need to be updated all the time.
 
  Is that correct?
 
  My question is that the performance must be low if keeping
 consistency in a
  large scale distributed environment. How does HBase handle this issue?
 
  Thanks so much!
 
  Bing
 
 
  On Thu, Feb 23, 2012 at 1:17 AM, Jacques whs...@gmail.com wrote:
 
   It is highly unlikely that you could replace Solr with HBase.
  They're
   really apples and oranges.
  
  
   On Wed, Feb 22, 2012 at 1:09 AM, Bing Li lbl...@gmail.com wrote:
  
   Dear all,
  
   I wonder how data in HBase is indexed? Now Solr is used in my
 system
   because data is managed in inverted index. Such an index is
 suitable to
   retrieve unstructured and huge amount of data. How does HBase deal
 with
   the
   issue? May I replaced Solr with HBase?
  
   Thanks so much!
  
   Best regards,
   Bing
  
  
  
 







How is Data Indexed in HBase?

2012-02-22 Thread Bing Li
Dear all,

I wonder how data in HBase is indexed? Now Solr is used in my system
because data is managed in inverted index. Such an index is suitable to
retrieve unstructured and huge amount of data. How does HBase deal with the
issue? May I replaced Solr with HBase?

Thanks so much!

Best regards,
Bing


Re: Solr HBase - Re: How is Data Indexed in HBase?

2012-02-22 Thread Bing Li
Mr Gupta,

Thanks so much for your reply!

In my use cases, retrieving data by keyword is one of them. I think Solr is
a proper choice.

However, Solr does not provide a complex enough support to rank. And,
frequent updating is also not suitable in Solr. So it is difficult to
retrieve data randomly based on the values other than keyword frequency in
text. In this case, I attempt to use HBase.

But I don't know how HBase support high performance when it needs to keep
consistency in a large scale distributed system.

Now both of them are used in my system.

I will check out ElasticSearch.

Best regards,
Bing


On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta tvi...@readypulse.comwrote:

 Bing,
 Its a classic battle on whether to use solr or hbase or a combination of
 both. both systems are very different but there is some overlap in the
 utility. they also differ vastly when it compares to computation power,
 storage needs, etc. so in the end, it all boils down to your use case. you
 need to pick the technology that it best suited to your needs.
 im still not clear on your use case though.

 btw, if you haven't started using solr yet - then you might want to
 checkout ElasticSearch. I spent over a week researching between solr and ES
 and eventually chose ES due to its cool merits.

 thanks


 On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu yuzhih...@gmail.com wrote:

 There is no secondary index support in HBase at the moment.

 It's on our road map.

 FYI

 On Wed, Feb 22, 2012 at 9:28 AM, Bing Li lbl...@gmail.com wrote:

  Jacques,
 
  Yes. But I still have questions about that.
 
  In my system, when users search with a keyword arbitrarily, the query is
  forwarded to Solr. No any updating operations but appending new indexes
  exist in Solr managed data.
 
  When I need to retrieve data based on ranking values, HBase is used.
 And,
  the ranking values need to be updated all the time.
 
  Is that correct?
 
  My question is that the performance must be low if keeping consistency
 in a
  large scale distributed environment. How does HBase handle this issue?
 
  Thanks so much!
 
  Bing
 
 
  On Thu, Feb 23, 2012 at 1:17 AM, Jacques whs...@gmail.com wrote:
 
   It is highly unlikely that you could replace Solr with HBase.  They're
   really apples and oranges.
  
  
   On Wed, Feb 22, 2012 at 1:09 AM, Bing Li lbl...@gmail.com wrote:
  
   Dear all,
  
   I wonder how data in HBase is indexed? Now Solr is used in my system
   because data is managed in inverted index. Such an index is suitable
 to
   retrieve unstructured and huge amount of data. How does HBase deal
 with
   the
   issue? May I replaced Solr with HBase?
  
   Thanks so much!
  
   Best regards,
   Bing
  
  
  
 





Re: Sort by the number of matching terms (coord value)

2012-02-16 Thread Li Li
you can fool the lucene scoring fuction. override each function such as idf
queryNorm lengthNorm and let them simply return 1.0f.
I don't lucene 4 will expose more details. but for 2.x/3.x, lucene can only
score by vector space model and the formula can't be replaced by users.

On Fri, Feb 17, 2012 at 10:47 AM, Nicholas Clark clark...@gmail.com wrote:

 Hi,

 I'm looking for a way to sort results by the number of matching terms.
 Being able to sort by the coord() value or by the overlap value that gets
 passed into the coord() function would do the trick. Is there a way I can
 expose those values to the sort function?

 I'd appreciate any help that points me in the right direction. I'm OK with
 making basic code modifications.

 Thanks!

 -Nick



Re: Can I rebuild an index and remove some fields?

2012-02-15 Thread Li Li
great. I think you could make it a public tool. maybe others also need such
functionality.

On Thu, Feb 16, 2012 at 5:31 AM, Robert Stewart bstewart...@gmail.comwrote:

 I implemented an index shrinker and it works.  I reduced my test index
 from 6.6 GB to 3.6 GB by removing a single shingled field I did not
 need anymore.  I'm actually using Lucene.Net for this project so code
 is C# using Lucene.Net 2.9.2 API.  But basic idea is:

 Create an IndexReader wrapper that only enumerates the terms you want
 to keep, and that removes terms from documents when returning
 documents.

 Use the SegmentMerger to re-write each segment (where each segment is
 wrapped by the wrapper class), writing new segment to a new directory.
 Collect the SegmentInfos and do a commit in order to create a new
 segments file in new index directory

 Done - you now have a shrunk index with specified terms removed.

 Implementation uses separate thread for each segment, so it re-writes
 them in parallel.  Took about 15 minutes to do 770,000 doc index on my
 macbook.


 On Tue, Feb 14, 2012 at 10:12 PM, Li Li fancye...@gmail.com wrote:
  I have roughly read the codes of 4.0 trunk. maybe it's feasible.
 SegmentMerger.add(IndexReader) will add to be merged Readers
 merge() will call
   mergeTerms(segmentWriteState);
   mergePerDoc(segmentWriteState);
 
mergeTerms() will construct fields from IndexReaders
 for(int
  readerIndex=0;readerIndexmergeState.readers.size();readerIndex++) {
   final MergeState.IndexReaderAndLiveDocs r =
  mergeState.readers.get(readerIndex);
   final Fields f = r.reader.fields();
   final int maxDoc = r.reader.maxDoc();
   if (f != null) {
 slices.add(new ReaderUtil.Slice(docBase, maxDoc, readerIndex));
 fields.add(f);
   }
   docBase += maxDoc;
 }
 So If you wrapper your IndexReader and override its fields() method,
  maybe it will work for merge terms.
 
 for DocValues, it can also override AtomicReader.docValues(). just
  return null for fields you want to remove. maybe it should
  traverse CompositeReader's getSequentialSubReaders() and wrapper each
  AtomicReader
 
 other things like term vectors norms are similar.
  On Wed, Feb 15, 2012 at 6:30 AM, Robert Stewart bstewart...@gmail.com
 wrote:
 
  I was thinking if I make a wrapper class that aggregates another
  IndexReader and filter out terms I don't want anymore it might work.
 And
  then pass that wrapper into SegmentMerger.  I think if I filter out
 terms
  on GetFieldNames(...) and Terms(...) it might work.
 
  Something like:
 
  HashSetstring ignoredTerms=...;
 
  FilteringIndexReader wrapper=new FilterIndexReader(reader);
 
  SegmentMerger merger=new SegmentMerger(writer);
 
  merger.add(wrapper);
 
  merger.Merge();
 
 
 
 
 
  On Feb 14, 2012, at 1:49 AM, Li Li wrote:
 
   for method 2, delete is wrong. we can't delete terms.
 you also should hack with the tii and tis file.
  
   On Tue, Feb 14, 2012 at 2:46 PM, Li Li fancye...@gmail.com wrote:
  
   method1, dumping data
   for stored fields, you can traverse the whole index and save it to
   somewhere else.
   for indexed but not stored fields, it may be more difficult.
  if the indexed and not stored field is not analyzed(fields such as
   id), it's easy to get from FieldCache.StringIndex.
  But for analyzed fields, though theoretically it can be restored
 from
   term vector and term position, it's hard to recover from index.
  
   method 2, hack with metadata
   1. indexed fields
delete by query, e.g. field:*
   2. stored fields
 because all fields are stored sequentially. it's not easy to
  delete
   some fields. this will not affect search speed. but if you want to
 get
   stored fields,  and the useless fields are very long, then it will
 slow
   down.
 also it's possible to hack with it. but need more effort to
   understand the index file format  and traverse the fdt/fdx file.
  
 
 http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html
  
   this will give you some insight.
  
  
   On Tue, Feb 14, 2012 at 6:29 AM, Robert Stewart 
 bstewart...@gmail.com
  wrote:
  
   Lets say I have a large index (100M docs, 1TB, split up between 10
   indexes).  And a bunch of the stored and indexed fields are not
  used in
   search at all.  In order to save memory and disk, I'd like to
 rebuild
  that
   index *without* those fields, but I don't have original documents to
   rebuild entire index with (don't have the full-text anymore, etc.).
  Is
   there some way to rebuild or optimize an existing index with only a
  sub-set
   of the existing indexed fields?  Or alternatively is there a way to
  avoid
   loading some indexed fields at all ( to avoid loading term infos and
  terms
   index ) ?
  
   Thanks
   Bob
  
  
  
 
 



Re: Can I rebuild an index and remove some fields?

2012-02-14 Thread Li Li
I have roughly read the codes of 4.0 trunk. maybe it's feasible.
SegmentMerger.add(IndexReader) will add to be merged Readers
merge() will call
  mergeTerms(segmentWriteState);
  mergePerDoc(segmentWriteState);

   mergeTerms() will construct fields from IndexReaders
for(int
readerIndex=0;readerIndexmergeState.readers.size();readerIndex++) {
  final MergeState.IndexReaderAndLiveDocs r =
mergeState.readers.get(readerIndex);
  final Fields f = r.reader.fields();
  final int maxDoc = r.reader.maxDoc();
  if (f != null) {
slices.add(new ReaderUtil.Slice(docBase, maxDoc, readerIndex));
fields.add(f);
  }
  docBase += maxDoc;
}
So If you wrapper your IndexReader and override its fields() method,
maybe it will work for merge terms.

for DocValues, it can also override AtomicReader.docValues(). just
return null for fields you want to remove. maybe it should
traverse CompositeReader's getSequentialSubReaders() and wrapper each
AtomicReader

other things like term vectors norms are similar.
On Wed, Feb 15, 2012 at 6:30 AM, Robert Stewart bstewart...@gmail.comwrote:

 I was thinking if I make a wrapper class that aggregates another
 IndexReader and filter out terms I don't want anymore it might work.   And
 then pass that wrapper into SegmentMerger.  I think if I filter out terms
 on GetFieldNames(...) and Terms(...) it might work.

 Something like:

 HashSetstring ignoredTerms=...;

 FilteringIndexReader wrapper=new FilterIndexReader(reader);

 SegmentMerger merger=new SegmentMerger(writer);

 merger.add(wrapper);

 merger.Merge();





 On Feb 14, 2012, at 1:49 AM, Li Li wrote:

  for method 2, delete is wrong. we can't delete terms.
you also should hack with the tii and tis file.
 
  On Tue, Feb 14, 2012 at 2:46 PM, Li Li fancye...@gmail.com wrote:
 
  method1, dumping data
  for stored fields, you can traverse the whole index and save it to
  somewhere else.
  for indexed but not stored fields, it may be more difficult.
 if the indexed and not stored field is not analyzed(fields such as
  id), it's easy to get from FieldCache.StringIndex.
 But for analyzed fields, though theoretically it can be restored from
  term vector and term position, it's hard to recover from index.
 
  method 2, hack with metadata
  1. indexed fields
   delete by query, e.g. field:*
  2. stored fields
because all fields are stored sequentially. it's not easy to
 delete
  some fields. this will not affect search speed. but if you want to get
  stored fields,  and the useless fields are very long, then it will slow
  down.
also it's possible to hack with it. but need more effort to
  understand the index file format  and traverse the fdt/fdx file.
 
 http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html
 
  this will give you some insight.
 
 
  On Tue, Feb 14, 2012 at 6:29 AM, Robert Stewart bstewart...@gmail.com
 wrote:
 
  Lets say I have a large index (100M docs, 1TB, split up between 10
  indexes).  And a bunch of the stored and indexed fields are not
 used in
  search at all.  In order to save memory and disk, I'd like to rebuild
 that
  index *without* those fields, but I don't have original documents to
  rebuild entire index with (don't have the full-text anymore, etc.).  Is
  there some way to rebuild or optimize an existing index with only a
 sub-set
  of the existing indexed fields?  Or alternatively is there a way to
 avoid
  loading some indexed fields at all ( to avoid loading term infos and
 terms
  index ) ?
 
  Thanks
  Bob
 
 
 




Re: New segment file created too often

2012-02-13 Thread Li Li
 Commit is called
after adding each document


 you should add enough documents and then calling a commit. commit is a
cost operation.
 if you want to get latest feeded documents, you could use NRT

On Tue, Feb 14, 2012 at 12:47 AM, Huy Le hu...@springpartners.com wrote:

 Hi,

 I am using solr 3.5.  I seeing solr keeps creating new segment files (1MB
 files) so often that it triggers segment merge about every one minute. I
 search the news archive, but could not find any info on this issue.  I am
 indexing about 10 docs of less 2KB each every second.  Commit is called
 after adding each document. Relevant config params are:

 mergeFactor10/mergeFactor
 ramBufferSizeMB1024/ramBufferSizeMB
 maxMergeDocs2147483647/maxMergeDocs

 What might be triggering this frequent new segment files creation?  Thanks!

 Huy

 --
 Huy Le
 Spring Partners, Inc.
 http://springpadit.com



Re: New segment file created too often

2012-02-13 Thread Li Li
as far as I know, there are three situation it will be flushed to a new
segment: RAM buffer for posting data structure is used up; added doc
numbers are exceeding threshold and there are many deletions in a segment
but your configuration seems it is not likely to flush many small segments.

ramBufferSizeMB1024/ramBufferSizeMB
maxMergeDocs2147483647/maxMergeDocs
On Tue, Feb 14, 2012 at 1:10 AM, Huy Le hu...@springpartners.com wrote:

 Hi,

 I am using solr 3.5.  As I understood it, NRT is a solr 4 feature, but solr
 4 is not released yet.

 I understand commit after adding each document is expensive, but the
 application requires that documents be available after adding to the index.

 What I don't understand is why new segment files are created so often.
 Are the commit calls triggering new segment files being created?  I don't
 see this behavior in another environment of the same version of solr.

 Huy

 On Mon, Feb 13, 2012 at 11:55 AM, Li Li fancye...@gmail.com wrote:

   Commit is called
  after adding each document
 
 
   you should add enough documents and then calling a commit. commit is a
  cost operation.
   if you want to get latest feeded documents, you could use NRT
 
  On Tue, Feb 14, 2012 at 12:47 AM, Huy Le hu...@springpartners.com
 wrote:
 
   Hi,
  
   I am using solr 3.5.  I seeing solr keeps creating new segment files
  (1MB
   files) so often that it triggers segment merge about every one minute.
 I
   search the news archive, but could not find any info on this issue.  I
 am
   indexing about 10 docs of less 2KB each every second.  Commit is called
   after adding each document. Relevant config params are:
  
   mergeFactor10/mergeFactor
   ramBufferSizeMB1024/ramBufferSizeMB
   maxMergeDocs2147483647/maxMergeDocs
  
   What might be triggering this frequent new segment files creation?
   Thanks!
  
   Huy
  
   --
   Huy Le
   Spring Partners, Inc.
   http://springpadit.com
  
 



 --
 Huy Le
 Spring Partners, Inc.
 http://springpadit.com



Re: New segment file created too often

2012-02-13 Thread Li Li
can you post your config file?
I found there are 2 places to config ramBufferSizeMB in latest svn of 3.6's
example solrconfig.xml. trying to modify them both?

  indexDefaults

useCompoundFilefalse/useCompoundFile

mergeFactor10/mergeFactor
!-- Sets the amount of RAM that may be used by Lucene indexing
 for buffering added documents and deletions before they are
 flushed to the Directory.  --
ramBufferSizeMB32/ramBufferSizeMB
!-- If both ramBufferSizeMB and maxBufferedDocs is set, then
 Lucene will flush based on whichever limit is hit first.
  --
!-- maxBufferedDocs1000/maxBufferedDocs --

maxFieldLength1/maxFieldLength
writeLockTimeout1000/writeLockTimeout

.
!-- termIndexInterval256/termIndexInterval --
  /indexDefaults

  !-- Main Index

   Values here override the values in the indexDefaults section
   for the main on disk index.
--
  mainIndex

useCompoundFilefalse/useCompoundFile
ramBufferSizeMB32/ramBufferSizeMB
mergeFactor10/mergeFactor
   
  /mainIndex

On Tue, Feb 14, 2012 at 1:10 AM, Huy Le hu...@springpartners.com wrote:

 Hi,

 I am using solr 3.5.  As I understood it, NRT is a solr 4 feature, but solr
 4 is not released yet.

 I understand commit after adding each document is expensive, but the
 application requires that documents be available after adding to the index.

 What I don't understand is why new segment files are created so often.
 Are the commit calls triggering new segment files being created?  I don't
 see this behavior in another environment of the same version of solr.

 Huy

 On Mon, Feb 13, 2012 at 11:55 AM, Li Li fancye...@gmail.com wrote:

   Commit is called
  after adding each document
 
 
   you should add enough documents and then calling a commit. commit is a
  cost operation.
   if you want to get latest feeded documents, you could use NRT
 
  On Tue, Feb 14, 2012 at 12:47 AM, Huy Le hu...@springpartners.com
 wrote:
 
   Hi,
  
   I am using solr 3.5.  I seeing solr keeps creating new segment files
  (1MB
   files) so often that it triggers segment merge about every one minute.
 I
   search the news archive, but could not find any info on this issue.  I
 am
   indexing about 10 docs of less 2KB each every second.  Commit is called
   after adding each document. Relevant config params are:
  
   mergeFactor10/mergeFactor
   ramBufferSizeMB1024/ramBufferSizeMB
   maxMergeDocs2147483647/maxMergeDocs
  
   What might be triggering this frequent new segment files creation?
   Thanks!
  
   Huy
  
   --
   Huy Le
   Spring Partners, Inc.
   http://springpadit.com
  
 



 --
 Huy Le
 Spring Partners, Inc.
 http://springpadit.com



Re: Can I rebuild an index and remove some fields?

2012-02-13 Thread Li Li
method1, dumping data
for stored fields, you can traverse the whole index and save it to
somewhere else.
for indexed but not stored fields, it may be more difficult.
if the indexed and not stored field is not analyzed(fields such as id),
it's easy to get from FieldCache.StringIndex.
But for analyzed fields, though theoretically it can be restored from
term vector and term position, it's hard to recover from index.

method 2, hack with metadata
1. indexed fields
  delete by query, e.g. field:*
2. stored fields
   because all fields are stored sequentially. it's not easy to delete
some fields. this will not affect search speed. but if you want to get
stored fields,  and the useless fields are very long, then it will slow
down.
   also it's possible to hack with it. but need more effort to
understand the index file format  and traverse the fdt/fdx file.
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html

this will give you some insight.

On Tue, Feb 14, 2012 at 6:29 AM, Robert Stewart bstewart...@gmail.comwrote:

 Lets say I have a large index (100M docs, 1TB, split up between 10
 indexes).  And a bunch of the stored and indexed fields are not used in
 search at all.  In order to save memory and disk, I'd like to rebuild that
 index *without* those fields, but I don't have original documents to
 rebuild entire index with (don't have the full-text anymore, etc.).  Is
 there some way to rebuild or optimize an existing index with only a sub-set
 of the existing indexed fields?  Or alternatively is there a way to avoid
 loading some indexed fields at all ( to avoid loading term infos and terms
 index ) ?

 Thanks
 Bob


Re: Can I rebuild an index and remove some fields?

2012-02-13 Thread Li Li
for method 2, delete is wrong. we can't delete terms.
   you also should hack with the tii and tis file.

On Tue, Feb 14, 2012 at 2:46 PM, Li Li fancye...@gmail.com wrote:

 method1, dumping data
 for stored fields, you can traverse the whole index and save it to
 somewhere else.
 for indexed but not stored fields, it may be more difficult.
 if the indexed and not stored field is not analyzed(fields such as
 id), it's easy to get from FieldCache.StringIndex.
 But for analyzed fields, though theoretically it can be restored from
 term vector and term position, it's hard to recover from index.

 method 2, hack with metadata
 1. indexed fields
   delete by query, e.g. field:*
 2. stored fields
because all fields are stored sequentially. it's not easy to delete
 some fields. this will not affect search speed. but if you want to get
 stored fields,  and the useless fields are very long, then it will slow
 down.
also it's possible to hack with it. but need more effort to
 understand the index file format  and traverse the fdt/fdx file.
 http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/fileformats.html

 this will give you some insight.


 On Tue, Feb 14, 2012 at 6:29 AM, Robert Stewart bstewart...@gmail.comwrote:

 Lets say I have a large index (100M docs, 1TB, split up between 10
 indexes).  And a bunch of the stored and indexed fields are not used in
 search at all.  In order to save memory and disk, I'd like to rebuild that
 index *without* those fields, but I don't have original documents to
 rebuild entire index with (don't have the full-text anymore, etc.).  Is
 there some way to rebuild or optimize an existing index with only a sub-set
 of the existing indexed fields?  Or alternatively is there a way to avoid
 loading some indexed fields at all ( to avoid loading term infos and terms
 index ) ?

 Thanks
 Bob





more sql-like commands for solr

2012-02-07 Thread Li Li
hi all,
we have used solr to provide searching service in many products. I
found for each product, we have to do some configurations and query
expressions.
our users are not used to this. they are familiar with sql and they may
describe like this: I want a query that can search books whose title
contains java, and I will group these books by publishing year and order by
matching score and freshness, the weight of score is 2 and the weight of
freshness is 1.
maybe they will be happy if they can use sql like statements to convey
their needs.
select * from books where title contains java group by pub_year order
by score^2, freshness^1
also they may like they can insert or delete documents by delete from
books where title contains java and pub_year between 2011 and 2012.
we can define some language similar to sql and translate the to solr
query string such as .../select/?q=+title:java^2 +pub_year:2011
this may be equivalent to apache hive for hadoop.


Re: Chinese Phonetic search

2012-02-07 Thread Li Li
you can convert Chinese words to pinyin and use n-gram to search phonetic
similar words

On Wed, Feb 8, 2012 at 11:10 AM, Floyd Wu floyd...@gmail.com wrote:

 Hi there,

 Does anyone here ever implemented phonetic search especially with
 Chinese(traditional/simplified) using SOLR or Lucene?

 Please share some thought or point me a possible solution. (hint me search
 keywords)

 I've searched and read lot of related articles but have no luck.

 Many thanks.

 Floyd



  1   2   3   >