Re: java.lang.StackOverflowError if pass long string in q parameter
Hi Kumar, The problem you have here is StackOverflowError which is not related to the character limit of the q parameter. First of all, consider using pagination to fetch data from Solr. Secondly, share your configuration settings to startup Solr and how much index you have to check whether you need a tuning or not. Kind Regards, Furkan KAMACI On Thu, Aug 6, 2020 at 8:46 AM kumar gaurav wrote: > HI > > I am getting the following exception if passing a long String in q > parameter . > > > q=uid:TCYY1EGPR38SX7EZ+OR+uid:TCYY1EGPR6M1ARAZ+OR+uid:TCYY1EGPR3NTTO3Z+OR+uid:TCYY1EGPR8L7XDZZ+OR+uid:TSHO3J0AGFUI9J3Z+OR+uid:TSHO3J0AI1CJJ2AZ+OR+uid:TSHO3J0AI4FZTBWZ+OR+uid:TDRE3J13G97WNCLZ+OR+uid:TCYY1EGPRA72BGHZ+OR+uid:TCYY1EGPR9EQUJYZ+OR+uid:TCYY1EGPRCTJXQPZ+OR+uid:TCYY1EGPR6RXPP0Z+OR+uid:TDRE3J13GBUSFV4Z+OR+uid:TTSH3FLDI7NJA8WZ+OR+uid:TERG3LIS70URWI5Z+OR+uid:TERG3LIS70QKOJAZ+OR+uid:TCYY1EGPR9EVMD5Z+OR+uid:TCYY1EGPRC8CRJ2Z+OR+uid:TCYY1EGPRGMD8MYZ+OR+uid:TCYY1EGPRM5OP68Z+OR+uid:TERG3LIS71AU8ZAZ+OR+uid:TERG3LIS719WRJWZ+OR+uid:THAQ3LIZCJ7TSEUZ+OR+uid:TERG3LIS70Q2O8IZ+OR+uid:TCYY1EGPRGXN2ZIZ+OR+uid:TCYY1EGPRGYTH3FZ+OR+uid:TCYY1EGPRK1JFUQZ+OR+uid:TCYY1EGPRM3JNN0Z+OR+uid:TERG3LIS70QPC4FZ+OR+uid:TBBA3LKKUOLVK89Z+OR+uid:TSOC1HULKNGBDUEZ+OR+uid:TSOC1HULKMTEOGTZ+OR+uid:TCYY1EGPRF93SE8Z+OR+uid:TCYY1EGPREUHNVMZ+OR+uid:TCYY1EGPRESMC0MZ+OR+uid:TCYY1EGPRDZE49OZ+OR+uid:THMB1OMS16B3OCPZ+OR+uid:TSOC1NS0MMMNAXOZ+OR+uid:TSOC1NS0GVJHP82Z+OR+uid:TSOC1NS0H3QAQQ7Z+OR+uid:TCYY2BESMSQWQBFZ+OR+uid:TCYY2BESMTJMA60Z+OR+uid:TCYY2BESN9EK5GFZ+OR+uid:TCYY2BESN9ER8PYZ+OR+uid:TSOC1NS0LBFBEAUZ+OR+uid:THAT2AOL6U500A1Z+OR+uid:THAT2AON5W2HVY9Z+OR+uid:THAT2AOL86LNHYTZ+OR+uid:TCYY2BESMO42C3GZ+OR+uid:TCYY1EGPSZSFLLTZ+OR+uid:TCYY1EGPT0X5B3DZ+OR+uid:THAT2AOL8GMD7O4Z+OR+uid:TSHT3FL6STFG1DEZ+OR+uid:TTSH3J0X6W92MPYZ+OR+uid:TTSH3J0X6SKNCECZ+OR+uid:TCYY1EGPS2J2UF4Z+OR+uid:TCYY1EGPT4HFILGZ+OR+uid:TCYY1EGPRQQQH7QZ+OR+uid:TCYY1EGPRZ72UA6Z+OR+uid:TSHT3FL6SWUTR9OZ+OR+uid:TTSH3J0X759RPQRZ+OR+uid:TTSH3J0X7ES5BR8Z+OR+uid:TTSH3J0X7CSXHAYZ+OR+uid:TCYY1EGPT74CXJMZ+OR+uid:TCYY1EGPS00631RZ+OR+uid:TCYY1EGPS0YU45YZ+OR+uid:TCYY1EGPS4BXXEFZ+OR+uid:TTSH3J0X7HFX0XMZ+OR+uid:TTSH3J0X1AY49RBZ+OR+uid:TTSH3J0X1B36WWWZ+OR+uid:TTSH3J0X1IOH3I8Z+OR+uid:TCYY1EGPSFA5BV2Z+OR+uid:TCYY1EGPSJ43BQNZ+OR+uid:TDAASAPEOHUVZZ+OR+uid:TCYY1EGPSPUZD2PZ+OR+uid:TTSH3J0X3B4S8E9Z+OR+uid:TTSH3J0X6O6TKRQZ+OR+uid:TBRF3LJHIFUI9G6Z+OR+uid:TTSH3J0X4O4S6AUZ+OR+uid:TCYY1EGPSPJHP2NZ+OR+uid:TCYY1EGPSQ95JCCZ+OR+uid:TCYY1EGPSSFR7Z0Z+OR+uid:TCYY1EGPSUYSCNKZ+OR+uid:TTSH3J0X65JG54CZ+OR+uid:TTSH3J0X6CS2ZAXZ+OR+uid:TTSH3J0X6HX537OZ+OR+uid:TTSH3J0X6PP1YGSZ+OR+uid:TCYY1EGPSWN05FGZ+OR+uid:TCYY1EGPSYB513WZ+OR+uid:TCYY1EGPSZR3X2SZ+OR+uid:TCYY1EGPT21MLB5Z+OR+uid:TBRF3LJHIFUOGPPZ+OR+uid:TTSH3J0X1TT376ZZ+OR+uid:TTSH3J0X4HE2ERLZ+OR+uid:TTSH3J0X39NEGZYZ+OR+uid:TCYY1EGPT4ZMPX4Z+OR+uid:TCCHSB60XT4YLZ+OR+uid:TCCHSB61WL7AZZ+OR+uid:TCYYSAUS1XIV3Z+OR+uid:TTSH3J0X6KMH7M2Z+OR+uid:TTSH3J0X1I5FYDGZ+OR+uid:TTSH3J0X4MISXH4Z+OR+uid:TCCHSB60XMUV1Z+OR+uid:TCCHSB61HK0B7Z+OR+uid:TCCHSB61VT84HZ+OR+uid:TCCHSB61ECHWDZ+OR+uid:TTSH3J0X1DU668XZ+OR+uid:TTSH3J0X1QGEU28Z+OR+uid:TTSH3J0X4BCEM0UZ+OR+uid:TTSH3J0X4MLHNIMZ+OR+uid:TCCHSB61E6Y87Z+OR+uid:TCYDSA2IT31VEZ+OR+uid:TCYDSA2IVH6HBZ+OR+uid:TDAASAPG0ADD5Z+OR+uid:TTSH3J0X4SZZWY7Z+OR+uid:TTSH3J0X36NM6Y7Z+OR+uid:TDAASAPFOY8EKZ+OR+uid:TDAASAPMOVIV5Z+OR+uid:TDAASAPI7JPNUZ+OR+uid:TDAASAPHV0UKXZ+OR+uid:TDAASAPFNE1HLZ+OR+uid:TDAASAPLVL68OZ+OR+uid:TDAASAPMLS2YXZ+OR+uid:TMOHS9OKD987QZ+OR+uid:TKKT1AL3XKSUWK4Z+OR+uid:TDAASAPEK2QUWZ+OR+uid:TDAASAPEL75NWZ+OR+uid:TDAASAPF8SZJSZ+OR+uid:TDAASAPBBDB7LZ+OR+uid:TKKT1AL3YGBBT6WZ+OR+uid:TKKT1AL3ZC37N63Z+OR+uid:TERG1F6W6ULALO6Z+OR+uid:TERG1F6W6V16EJOZ+OR+uid:TDAASAPF2MO5CZ+OR+uid:TCYY2BG5PF8FQEYZ+OR+uid:TCYY2BG5QA8QLLMZ+OR+uid:TCYY2BG5R1YBCSAZ+OR+uid:TERG1F6W6V1P63TZ+OR+uid:TERG1F6W6VDJOHPZ+OR+uid:TERG1F6W6VP70CYZ+OR+uid:TERG1F6W6WX5D8KZ+OR+uid:TCYY2BG5REQ3IJHZ+OR+uid:TCYY2BG5RRFVUGDZ+OR+uid:TDAASAPDZ31GKZ+OR+uid:TDAASAPH1HNF1Z+OR+uid:TERG1F6W6XQ5UWYZ+OR+uid:TERG1F6W71MQDF5Z+OR+uid:TERG1F6W736SVVJZ+OR+uid:TRNG1F6W9NO8IW7Z+OR+uid:TDAASAPCC56WXZ+OR+uid:TDAASAPE9IZZ0Z+OR+uid:TDAASAPHVBD96Z+OR+uid:TDAASAPIDRAJ6Z+OR+uid:TRNG1F6W9NR1AJNZ+OR+uid:TRNG1F6W9O4U8GRZ+OR+uid:TRNG1F6W9OJE1CJZ+OR+uid:TRNG1F6W9PQJQGUZ+OR+uid:TDPYS9W9CZH9OZ+OR+uid:TSBNSB3TFCCEJZ+OR+uid:TMOUSB8BSI8VZZ+OR+uid:THDPVN9E24NKWZ+OR+uid:TRNG1F6W9PW0F9IZ+OR+uid:TRNGTB9IGWYVVZ+OR+uid:TWAT2FNTMY5NML5Z+OR+uid:TWAT2FNTMY5S3JAZ+OR+uid:THDPVN9DMRA5PZ+OR+uid:TKEISA5KVQB3SZ+OR+uid:TLTB2HENA46KWL8Z+OR+uid:TLTBSALTEXOSKZ+OR+uid:TWAT2FNTONX41SZZ+OR+uid:TWAT2FNTOODV5F8Z+OR+uid:TWAT2FNTOOLD7WKZ > > { > "error":{ > "msg":"java.lang.StackOverflowError", > "trace":"java.lang.RuntimeException: > java.lang.StackOverflowError\n\tat > org.apache.solr.servlet.HttpSolrCall.sendError(HttpSolrCall.java:662)\n\tat > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:530)\n\tat > > org.apache.solr.servlet.SolrDispatchFilter.doF
Re: Solr Query
Hi Swetha, Given URL is encoded. So, you can decode it before analyzing. Plus character is used for whitespaces when you encode a URL and minus sign represents a negative query in Solr. Kind Regards, Furkan KAMACI On Tue, Jul 7, 2020 at 9:16 PM swetha vemula wrote: > Hi, > > I have an URL and I want to break this down and run it in the admin console > but I am not what is ++ and - represents in the query. > > select?q=(StartPublish%3a%5b*+TO+-12-31T23%3a59%3a59.999Z%5d++-Content%3a(Birthdays%5c%2fAnniversaries))++-FriendlyUrl%3a(*%2farchive%2f*))++((Title_NGram%3a(swetha))%5e500+OR+(MetaTitle_NGram%3a(swetha))%5e400+OR+(MetaKeywords_NGram%3a(swetha))%5e300+OR+(MetaDescription_NGram%3a(swetha))%5e200+OR+(Content_NGram%3a(swetha))%5e1))++(ACL%3a((Everyone)+OR+(MIDCO410%5c%5cMidco%5c-AllEmployees)+OR+(MIDCO410%5c%5cMidco%5c-DotNetDevelopers)+OR+(MIDCO410%5c%5cMidco%5c-WebAdmins)+OR+(MIDCO410%5c%5cMidco%5c-Source%5c-Admin)&start=0&rows=1&wt=xml&version=2.2 > > Thank You, > Swetha. >
Re: Solr heap Old generation grows and it is not recovered by G1GC
Hi Reinaldo, Which version of Solr do you use and could you share your cache settings? On the other hand, did you check here: https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems Kind Regards, Furkan KAMACI On Thu, Jun 25, 2020 at 11:09 PM Odysci wrote: > Hi, > > I have a solrcloud setup with 12GB heap and I've been trying to optimize it > to avoid OOM errors. My index has about 30million docs and about 80GB > total, 2 shards, 2 replicas. > > In my testing setup I submit multiple queries to solr (same node), > sequentially, and with no overlap between the documents returned in each > query (so docs do not need to be kept in cache) > > When the queries return a smallish number of docs (say, below 1000), the > heap behavior seems "normal". Monitoring the gc log I see that young > generation grows then when GC kicks in, it goes considerably down. And the > old generation grows just a bit. > > However, at some point i have a query that returns over 300K docs (for a > total size of approximately 1GB). At this very point the OLD generation > size grows (almost by 2GB), and it remains high for all remaining time. > Even as new queries are executed, the OLD generation size does not go down, > despite multiple GC calls done afterwards. > > Can anyone shed some light on this behavior? > > I'm using the following GC options: > GC_TUNE=" \ > > -XX:+UseG1GC \ > > -XX:+PerfDisableSharedMem \ > > -XX:+ParallelRefProcEnabled \ > > -XX:G1HeapRegionSize=4m \ > > -XX:MaxGCPauseMillis=250 \ > > -XX:InitiatingHeapOccupancyPercent=75 \ > > -XX:+UseLargePages \ > > -XX:+AggressiveOpts \ > > " > Thanks > Reinaldo >
Re: Unsubscribe me
Hi Shashikant, You can tell me if you need help. By the way, one can use solr-user-ow...@lucene.apache.org for such kinds of questions so not to disturb the user mailing list. Kind Regards, Furkan KAMACI On Sat, Jun 20, 2020 at 12:53 PM Erick Erickson wrote: > Follow the instructions here: > http://lucene.apache.org/solr/community.html#mailing-lists-irc. You must > use the _exact_ same e-mail as you used to subscribe. > > If the initial try doesn't work and following the suggestions at the > "problems" link doesn't work for you, let us know. But note you need to > show us the _entire_ return header to allow anyone to diagnose the problem. > > Best, > Erick > > > > On Jun 20, 2020, at 3:22 AM, Shashikant Vaishnav < > shashikantvaish...@gmail.com> wrote: > > > > Unsubscribe please > >
Re: Solr Terms browsing in descending order
Hi Jigar, Is that a numeric field or not? By the way, have you checked the terms.sort parameter or json facet sort parameter? Kind Regards, Furkan KAMACI On Mon, Jun 1, 2020 at 11:37 PM Jigar Gajjar wrote: > Hello, > is it possible to retrieve index terms in the descending order using > terms handler, right now we get all terms in ascending order. > Thanks,Jigar Gajjar >
Re: Alternate Fields for Unified Highlighter
Hi David, Thanks for the response! I use Unified Highlighter combined with maxAnalyzedChars to accomplish my needs. I'll file an issue and PR for it! Kind Regards, Furkan KAMACI On Fri, May 22, 2020 at 11:25 PM David Smiley wrote: > Feel free to file an issue; I know it's not supported. I also don't think > it's a big deal because you can just ask Solr to return the > "alternateField", thus letting the client side choose when to use that. I > suppose it might be large, so I can imagine a concern there. It'd be nice > if Solr had a DocTransformer to accomplish that. > > I know it's been awhile; I'm curious how the UH has been working for you, > assuming you are using it. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Sun, Jun 2, 2019 at 6:47 AM Furkan KAMACI > wrote: > > > Hi All, > > > > I want to switch to Unified Highlighter due to performance reasons for my > > Solr 7.6 I was using these fields > > > > solrQuery.addHighlightField("content_*") > > .set("f.content_en.hl.alternateField", "content") > > .set("f.content_es.hl.alternateField", "content") > > .set("hl.useFastVectorHighlighter", "true"); > > .set("hl.maxAlternateFieldLength", 300); > > > > As far as I see, there is no definitions for alternate fields for unified > > highlighter. How can I configure such a configuration? > > > > Kind Regards, > > Furkan KAMACI > > >
Re: Require java 8 upgrade
Hi Akhila, Here is the related documentation: https://lucene.apache.org/solr/5_3_1/SYSTEM_REQUIREMENTS.html which says: "Apache Solr runs of Java 7 or greater, Java 8 is verified to be compatible and may bring some performance improvements. When using Oracle Java 7 or OpenJDK 7, be sure to not use the GA build 147 or update versions u40, u45 and u51! We recommend using u55 or later." Kind Regards, Furkan KAMACI On Fri, May 22, 2020 at 4:26 AM Akhila John wrote: > Hi Team, > > We use solr 5.3.1 for sitecore 8.2. > We require to upgrade Java version to 'Java 8 Update 251' and remove / > Upgrade Wireshark to 3.2.3 in our application servers. > Could you please advise if this would have any impact on the solr. Does > solr 5.3.1 support Java 8. > > Thanks and regards, > > Akhila > > Bupa A&NZ email disclaimer: The information contained in this email and > any attachments is confidential and may be subject to copyright or other > intellectual property protection. If you are not the intended recipient, > you are not authorized to use or disclose this information, and we request > that you notify us by reply mail or telephone and delete the original > message from your mail system. >
Re: TimestampUpdateProcessorFactory updates the field even if the value if present
Hi, Do you have an id field for your documents? On the other hand, does your document count increases when you index it again? Kind Regards, Furkan KAMACI On Fri, May 22, 2020 at 1:03 AM gnandre wrote: > Hi, > > I do not pass that field at all. > > Here is the document that I index again and again to test through Solr > Admin UI. > { > asset_id:"x:1", > title:"x" > } > > On Thu, May 21, 2020 at 5:25 PM Furkan KAMACI > wrote: > > > Hi, > > > > How do you index that document? Do you index it with an empty > > *index_time_stamp_create* field as the second time too? > > > > Kind Regards, > > Furkan KAMACI > > > > On Fri, May 22, 2020 at 12:05 AM gnandre > wrote: > > > > > Hi, > > > > > > Following is the update request processor chain. > > > > > > default="true" > > > > > > < > > > processor class="solr.TimestampUpdateProcessorFactory"> > > "fieldName">index_time_stamp_create class= > > > "solr.LogUpdateProcessorFactory" /> > > "solr.RunUpdateProcessorFactory" /> > > > > > > And, here is how the field is defined in schema.xml > > > > > > stored= > > > "true" /> > > > > > > Every time I index the same document, above field changes its value > with > > > latest timestamp. According to TimestampUpdateProcessorFactory javadoc > > > page, if a document does not contain a value in the timestamp field, a > > new > > > Date will be generated and added as the value of that field. After the > > > first indexing this document should always have a value, so why then it > > > gets updated later? > > > > > > I am using Solr Admin UI's Documents tab to index the document for > > testing. > > > I am using Solr 6.3 in master-slave architecture mode. > > > > > >
Re: TimestampUpdateProcessorFactory updates the field even if the value if present
Hi, How do you index that document? Do you index it with an empty *index_time_stamp_create* field as the second time too? Kind Regards, Furkan KAMACI On Fri, May 22, 2020 at 12:05 AM gnandre wrote: > Hi, > > Following is the update request processor chain. > > > < > processor class="solr.TimestampUpdateProcessorFactory"> "fieldName">index_time_stamp_create "solr.LogUpdateProcessorFactory" /> "solr.RunUpdateProcessorFactory" /> > > And, here is how the field is defined in schema.xml > > "true" /> > > Every time I index the same document, above field changes its value with > latest timestamp. According to TimestampUpdateProcessorFactory javadoc > page, if a document does not contain a value in the timestamp field, a new > Date will be generated and added as the value of that field. After the > first indexing this document should always have a value, so why then it > gets updated later? > > I am using Solr Admin UI's Documents tab to index the document for testing. > I am using Solr 6.3 in master-slave architecture mode. >
Re: Solrcloud Garbage Collection Suspension linked across nodes?
Hi John, I've denied and dropped him from mail list. Kind Regards, Furkan KAMACI On Wed, May 13, 2020 at 8:06 PM John Blythe wrote: > can we get this person blocked? > -- > John Blythe > > > On Wed, May 13, 2020 at 1:05 PM ART GALLERY wrote: > > > check out the videos on this website TROO.TUBE don't be such a > > sheep/zombie/loser/NPC. Much love! > > https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219 > > > > On Mon, May 4, 2020 at 5:43 PM Webster Homer > > wrote: > > > > > > My company has several Solrcloud environments. In our most active cloud > > we are seeing outages that are related to GC pauses. We have about 10 > > collections of which 4 get a lot of traffic. The solrcloud consists of 4 > > nodes with 6 processors and 11Gb heap size (25Gb physical memory). > > > > > > I notice that the 4 nodes seem to do their garbage collection at almost > > the same time. That seems strange to me. I would expect them to be more > > staggered. > > > > > > This morning we had a GC pause that caused problems . During that time > > our application service was reporting "No live SolrServers available to > > handle this request" > > > > > > Between 3:55 and 3:56 AM all 4 nodes were having some amount of garbage > > collection pauses, for 2 of the nodes it was minor, for one it was 50%. > For > > 3 nodes it lasted until 3>57. However the node with the worst impact > > didn't recover until 4am. > > > > > > How is it that all 4 nodes were in lock step doing GC? If they all are > > doing GC at the same time it defeats the purpose of having redundant > cloud > > servers. > > > We just this weekend switched to use G1GC from CMS > > > > > > At this point in time we also saw that traffic to solr was not well > > distributed. The application calls solr using CloudSolrClient which I > > thought did its own load balancing. We saw 10X more traffic going to one > > solr node that all the others, the we saw it start hitting another node. > > All solr queries come from our application. > > > > > > During this period of time I saw only 1 error message in the solr log: > > > ERROR (zkConnectionManagerCallback-8-thread-1) [ ] > > o.a.s.c.ZkController There was a problem finding the leader in > > zk:org.apache.solr.common.SolrException: Could not get leader props > > > > > > We are currently using Solr 7.7.2 > > > GC Tuning > > > GC_TUNE="-XX:NewRatio=3 \ > > > -XX:SurvivorRatio=4 \ > > > -XX:TargetSurvivorRatio=90 \ > > > -XX:MaxTenuringThreshold=8 \ > > > -XX:+UseG1GC \ > > > -XX:MaxGCPauseMillis=250 \ > > > -XX:+ParallelRefProcEnabled" > > > > > > > > > > > > > > > This message and any attachment are confidential and may be privileged > > or otherwise protected from disclosure. If you are not the intended > > recipient, you must not copy this message or attachment or disclose the > > contents to any other person. If you have received this transmission in > > error, please notify the sender immediately and delete the message and > any > > attachment from your system. Merck KGaA, Darmstadt, Germany and any of > its > > subsidiaries do not accept liability for any omissions or errors in this > > message which may arise as a result of E-Mail-transmission or for damages > > resulting from any unauthorized changes of the content of this message > and > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not guarantee that this message is free of viruses and > does > > not accept liability for any damages caused by any virus transmitted > > therewith. > > > > > > > > > > > > Click http://www.merckgroup.com/disclaimer to access the German, > > French, Spanish and Portuguese versions of this disclaimer. > > >
Re: Portable Solr
Hi, Are you looking for EmbeddedSolrServer [1]?: <https://cwiki.apache.org/confluence/display/solr/EmbeddedSolr> By the way, using Spring Boot with an embedded server is production ready [2]. On the other hand, embedded Solr with embedded Zookeper etc. less flexible and should be reserved for special circumstances. Kind Regards, Furkan KAMACI [1] https://cwiki.apache.org/confluence/display/solr/EmbeddedSolr [2] https://www.reddit.com/r/java/comments/499227/is_the_tomcat_server_embedded_in_spring_boot/ On Mon, Nov 4, 2019 at 9:59 AM Jörn Franke wrote: > Yes, simply search the mailing list or the web for embedded Solr and you > will find what you need. Nevertheless, running embedded is just for > development (also in case of Spring and others). Avoid it for an end user > facing server application. > > > Am 03.11.2019 um 17:02 schrieb Java Developer : > > > > Hi, > > > > Like portable embedded web server (Spring Boot or Takes Or Rapid) Takes ( > > https://github.com/yegor256/takes) or Undertow (http://undertow.io/) or > > Rapidoid (https://www.rapidoid.org/) > > > > Do we have portable Solr server? Want to build an Web application with > Solr > > with portability? > > > > The user needs should have only javaRest is portable... > > > > Please advise. > > > > Thanks >
Re: 8.2.0 getting warning - unable to load jetty, not starting JettyAdminServer
Hi Arnold, Such errors may arise due to file permission issues. I can run latest version without of Solr via docker image without any errors. Could you write which steps do you follow to run Solr docker? Kind Regards, Furkan KAMACI On Tue, Aug 20, 2019 at 1:25 AM Arnold Bronley wrote: > Hi, > > I am getting following warning in Solr admin UI logs. I did not get this > warning in Solr 8.1.1 > Please note that I am using Solr docker slim image from here - > https://hub.docker.com/_/solr/ > > Unable to load jetty, not starting JettyAdminServer >
Re: Slow Indexing scaling issue
Hi Parmeshwor, 2 hours for 3 gb of data seems too slow. We scale up to PBs in such a way: 1) Ignore all commits from client via IgnoreCommitOptimizeUpdateProcessorFactory 2) Heavy processes are done on external Tika server instead of Solr Cell with embedded Tika feature. 3) Adjust autocommit, softcommit and shard size according to your needs. 4) Adjust JVM parameters. 5) Do not use swap if you can. Kind Regards, Furkan KAMACI On Tue, Aug 13, 2019 at 8:37 PM Erick Erickson wrote: > Here’s some sample SolrJ code using TIka outside of Solr’s Extracting > Request Handler, along with some info about why loading Solr with the job > of extracting text is not optimal speed wise: > > https://lucidworks.com/post/indexing-with-solrj/ > > > On Aug 13, 2019, at 12:15 PM, Jan Høydahl wrote: > > > > You May want to review > https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-SlowIndexing > for some hints. > > > > Make sure to index with multiple parallel threads. Also remember that > using /extract on the solr side is resource intensive and may make your > cluster slow and unstable. Better to use Tika or similar on the client side > and send text docs to solr. > > > > Jan Høydahl > > > >> 13. aug. 2019 kl. 16:52 skrev Parmeshwor Thapa < > thapa.parmesh...@gmail.com>: > >> > >> Hi, > >> > >> We are having some issue on scaling solr indexing. Looking for > suggestion. > >> > >> Setup : We have two solr cloud (7.4) instances running in separate cloud > >> VMs with an external zookeeper ensemble. > >> > >> We are sending async / non-blocking http request to index documents in > solr. > >> 2 > >> > >> cloud VMs ( 4 core * 32 GB) > >> > >> 16 gb allocated for jvm > >> > >> We are sending all types to document to solr , which it would extract > and > >> index, Using /update/extract request handler > >> > >> We have stopwords.txt and dictionary (7mb) for stemming. > >> > >> > >> > >> Issue : indexing speed is quite slow for us. It is taking around 2 > hours to > >> index around 3 gb of data. 10,000 documents(PDF, xls, word, etc). We are > >> planning to index approximately 10 tb of data. > >> > >> Below is the solr config setting and schema, > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> >> languageSet="auto" ruleType="APPROX" concat="true"/> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> >> tokenizerModel="en-token.bin" sentenceModel="en-sent.bin"/> > >> > >> > >> > >> >> posTaggerModel="en-pos-maxent.bin"/> > >> > >> >> dictionary="en-lemmatizer-again.dict.txt"/> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> >> stored="false"/> > >> > >> > >> > >> > >> > >> >> indexed="true" stored="true"/> > >> > >> >> required="true" stored="true"/> > >> > >> >> indexed="true" stored="true"/> > >> > >> >> indexed="true" stored="true"/> > >> > >> >> stored="true"/> > >> > >> >> indexed="true" stored="true"/> > >> > >> >> indexed="true" stored="true" /> > >> > >> >> stored="true"/> > >> > >> >> indexed="true" stored="true"/> > >> > >> >> indexed="true" stored="true"/> > >> > >> >> indexed="true" stored="false"/> > >> > >> >> indexed="true" stored="false"/> > >> > >> >> indexed="true" stored="true"/> > >> > >> >> indexed="true" stored="true"/> > >> > >> >> indexed="true" stored="true"/> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> stored="false" > >> docValues="false" /> > >> > >> > >> > >> And below is the solrConfig, > >> > >> > >> > >> > >> > >> BEST_COMPRESSION > >> > >> > >> > >> > >> > >> > >> > >> 1000 > >> > >> 60 > >> > >> false > >> > >> > >> > >> > >> > >> > >> > >> ${solr.autoSoftCommit.maxTime:-1} > >> > >> > >> > >> > >> > >> >> > >> startup="lazy" > >> > >> class="solr.extraction.ExtractingRequestHandler" > > >> > >> > >> > >> true > >> > >> ignored_ > >> > >> content > >> > >> > >> > >> > >> > >> *Thanks,* > >> > >> *Parmeshwor Thapa* > >
Re: Solr 7.6.0: PingRequestHandler - Changing the default query (*:*)
Hi, You can change invariants i.e. *qt* and *q* of a *PingRequestHandler*: /search some test query Check documentation fore more info: https://lucene.apache.org/solr/7_6_0//solr-core/org/apache/solr/handler/PingRequestHandler.html Kind Regards, Furkan KAMACI On Sat, Aug 3, 2019 at 4:17 PM Erick Erickson wrote: > You can also (I think) explicitly define the ping request handler in > solrconfig.xml to do something else. > > > On Aug 2, 2019, at 9:50 AM, Jörn Franke wrote: > > > > Not sure if this is possible, but why not create a query handler in Solr > with any custom query and you use that as ping replacement ? > > > >> Am 02.08.2019 um 15:48 schrieb dinesh naik : > >> > >> Hi all, > >> I have few clusters with huge data set and whenever a node goes down its > >> not able to recover due to below reasons: > >> > >> 1. ping request handler is taking more than 10-15 seconds to respond. > The > >> ping requesthandler however, expects it will return in less than 1 > second > >> and fails a requestrecovery if it is not responded to in this time. > >> Therefore recoveries never would start. > >> > >> 2. soft commit is very low ie. 5 sec. This is a business requirement so > >> not much can be done here. > >> > >> As the standard/default admin/ping request handler is using *:* queries > , > >> the response time is much higher, and i am looking for an option to > change > >> the same so that the ping handler returns the results within few > >> miliseconds. > >> > >> here is an example for standard query time: > >> > >> snip--- > >> curl " > >> > http://hostname:8983/solr/parts/select?indent=on&q=*:*&rows=0&wt=json&distrib=false&debug=timing > >> " > >> { > >> "responseHeader":{ > >> "zkConnected":true, > >> "status":0, > >> "QTime":16620, > >> "params":{ > >> "q":"*:*", > >> "distrib":"false", > >> "debug":"timing", > >> "indent":"on", > >> "rows":"0", > >> "wt":"json"}}, > >> "response":{"numFound":1329638799,"start":0,"docs":[] > >> }, > >> "debug":{ > >> "timing":{ > >> "time":16620.0, > >> "prepare":{ > >> "time":0.0, > >> "query":{ > >> "time":0.0}, > >> "facet":{ > >> "time":0.0}, > >> "facet_module":{ > >> "time":0.0}, > >> "mlt":{ > >> "time":0.0}, > >> "highlight":{ > >> "time":0.0}, > >> "stats":{ > >> "time":0.0}, > >> "expand":{ > >> "time":0.0}, > >> "terms":{ > >> "time":0.0}, > >> "block-expensive-queries":{ > >> "time":0.0}, > >> "slow-query-logger":{ > >> "time":0.0}, > >> "debug":{ > >> "time":0.0}}, > >> "process":{ > >> "time":16619.0, > >> "query":{ > >> "time":16619.0}, > >> "facet":{ > >> "time":0.0}, > >> "facet_module":{ > >> "time":0.0}, > >> "mlt":{ > >> "time":0.0}, > >> "highlight":{ > >> "time":0.0}, > >> "stats":{ > >> "time":0.0}, > >> "expand":{ > >> "time":0.0}, > >> "terms":{ > >> "time":0.0}, > >> "block-expensive-queries":{ > >> "time":0.0}, > >> "slow-query-logger":{ > >> "time":0.0}, > >> "debug":{ > >> "time":0.0} > >> > >> > >> snap > >> > >> can we use query: _root_:abc in the ping request handler ? Tried this > query > >> and its returning the results within few miliseconds and also the nodes > are > >> able to recover without any issue. > >> > >> we want to use _root_ field for querying as this field is available in > all > >> our clusters with below definition: > >> >> termOffsets="false" stored="false" termPayloads="false" termPositions= > >> "false" docValues="false" termVectors="false"/> > >> Could you please let me know if using _root_ for querying in > >> pingRequestHandler will cause any problem? > >> > >> >> name="invariants"> /select _root_:abc > >> > >> > >> -- > >> Best Regards, > >> Dinesh Naik > >
Re: NRT for new items in index
Hi, First of all, could you check here: https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ to better understand hard commits, soft commits and transaction logs to achieve NRT search. Kind Regards, Furkan KAMACI On Wed, Jul 31, 2019 at 3:47 PM profiuser wrote: > Hi, > > we have something about 400 000 000 items in a solr collection. > We have set up auto commit property for this collection to 15 minutes. > Is a big collection and we using some caches etc. Therefore we have big > autocommit value. > > This have disadvantage that we haven't NRT searches. > > We would like to have NRT at least for searching for the newly added items. > > We read about new functionality "Category routed alilases" in a solr > version > 8.1. > > And we got an idea, that we could add to our collection schema field for > routing. > And at the time of indexing we check if item is new and to routing field we > set up value "new", or the item is older than some time period we set up > value to "old". > And we will have one category routed alias routedCollection, and there will > be 2 collections old and new. > > If we index new item, router choose new collection and this item is > inserted > to it. After some period we reindex item and we decide that this item is > old > and to routing field we set up value "old". Router decide to update > (insert) > item to collection old. But we expect that solr automatically check > uniqueness in all routed collections. And if solr found item in other > collection, than will be automatically deleted. But not !!! > > Is this expected behaviour? > > Could be used this functionality for issue we have? Or could someone > suggest > another solution, which ensure that we have all new items ready for NRT > searches? > > Thanks for your help > > > > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: SolrCloud recommended I/O RAID level
Hi Adi, RAID10 is good for satisfying both indexing and query, striping across mirror sets. However, you lose half of your raw disk space, just like with RAID1. Here is a mail thread of mine which discusses RAID levels for Solr specific: https://lists.apache.org/thread.html/462d7467b2f2d064223eb46763a6a6e606ac670fe7f7b40858d97c0d@1366325333@%3Csolr-user.lucene.apache.org%3E Kind Regards, Furkan KAMACI On Mon, Jul 29, 2019 at 10:25 PM Kaminski, Adi wrote: > Hi, > We are about to size large environment with 7 nodes/servers with > replication factor 2 of SolrCloud cluster (using Solr 7.6). > > The system contains parent-child (nested documents) schema, and about to > have 40M parent docs with 50-80 child docs each (in total 2-3.2B Solr docs). > > We have a use case that will require to update parent document fields > triggered by an application flow (with re-indexing or atomic/partial update > approach, that will probably require to upgrade to Solr 8.1.1 that supports > this feature and contains some fixes in nested docs handling area). > > Since these updates might be quite heavy from IOPS perspective, we would > like to make sure that the IO hardware and RAID configuration are optimized > (r/w ratio of 50% read and 50% write, to allow balanced search and update > flows). > > Can someone share similar scale/use- case/deployment RAID level > configuration ? > (I assume that RAID5&6 are not an option due to parity/dual parity heavy > impact on write operations, so it leaves RAID 0, 1 or 10). > > Thanks in advance, > Adi > > > > > Sent from Workspace ONE Boxer > > > This electronic message may contain proprietary and confidential > information of Verint Systems Inc., its affiliates and/or subsidiaries. The > information is intended to be for the use of the individual(s) or > entity(ies) named above. If you are not the intended recipient (or > authorized to receive this e-mail for the intended recipient), you may not > use, copy, disclose or distribute to anyone this message or any information > contained in this message. If you have received this electronic message in > error, please notify us by replying to this e-mail. >
Re: Basic Query Not Working - Please Help
Hi Vipul, You are welcome! Kind Regards, Furkan KAMACI On Fri, Jul 26, 2019 at 11:07 AM Vipul Bahuguna < newthings4learn...@gmail.com> wrote: > Hi Furkan - > > I realized that I was searching incorrectly. > I later realized that if I need to search by specific field, I need to do > as you suggested - > q=appname:App1 . > > OR if need to simply search by App1, then I need to use to > index my field appname at the time of insertion so that it can be later > search without specifying the fieldname. > > thanks for your response. > > On Tue, Jul 23, 2019 at 6:07 AM Furkan KAMACI > wrote: > > > Hi Vipul, > > > > Which query do you submit? Is that one: > > > > q=appname:App1 > > > > Kind Regards, > > Furkan KAMACI > > > > On Mon, Jul 22, 2019 at 10:52 AM Vipul Bahuguna < > > newthings4learn...@gmail.com> wrote: > > > > > Hi, > > > > > > I have installed SOLR 8.1.1. > > > I am new and trying the very basics. > > > > > > I installed solr8.1.1 on Windows and I am using SOLR in standalone > mode. > > > > > > Steps I followed - > > > > > > 1. created a core as follows: > > > solr create_core -c dox > > > > > > 2. updated the managed_schema.xml file to add few specific fields > > specific > > > to my schema as belows: > > > > > > stored="true"/> > > > stored="true"/> > > > > stored="true"/> > > > > > stored="true"/> > > > > > > 3. then i restarted SOLR > > > > > > 4. then i went to the Documents tab to enter my sample data for > indexing, > > > which looks like below: > > > { > > > > > > "id" : "1", > > > "prjname" : "Project1", > > > "apps" : [ > > > { > > > "appname" : "App1", > > > "topics" : [ > > > { > > > "topicname" : "topic1", > > > "links" : [ > > > "http://www.google.com";, > > > "http://www.t6.com"; > > > ] > > > }, > > > { > > > "topicname" : "topic2", > > > "links" : [ > > > "http://www.java.com";, > > > "http://www.rediff.com"; > > > ] > > > } > > > ] > > > }, > > > { > > > "appname" : "App2", > > > "topics" : [ > > > { > > > "topicname" : "topic3", > > > "links" : [ > > > "http://www.t3.com";, > > > "http://www.t4.com"; > > > ] > > > }, > > > { > > > "topicname" : "topic4", > > > "links" : [ > > > "http://www.rules.com";, > > > "http://www.amazon.com"; > > > ] > > > } > > > ] > > > } > > > ] > > > } > > > > > > 5. Now when i go to Query tab and click Execute Search with *.*, it > shows > > > my recently added document as follows: > > > { > > > "responseHeader":{ "status":0, "QTime":0, "params":{ "q":"*:*", "_": > > > "1563780352100"}}, "response":{"numFound":1,"start":0,"docs":[ { > > "id":"1", > > > " > > > prjname":["Project1"], "apps":["{appname=App1, > topics=[{topicname=topic1, > > > links=[http://www.google.com, http://www.t6.com]}, {topicname=topic2, > > > links=[http://www.java.com, http://www.rediff.com]}]}";, > "{appname=App2, > > > topics=[{topicname=topic3, links=[http://www.t3.com, http://www.t4.com > > ]}, > > > {topicname=topic4, links=[http://www.rules.com, http://www.amazon.com > > > ]}]}"], > > > "_version_":1639742305772503040}] }} > > > > > > 6. But now when I am trying to search based on field topicname or > > prjname, > > > it does not returns any document. Even if put anything in q like App1, > > zero > > > results are being returned. > > > > > > > > > Can someone help me understanding what I might have done incorrectly? > > > May be I defined my schema incorrectly. > > > > > > Thanks in advance > > > > > >
Re: Problem with solr suggester in case of non-ASCII characters
Hi Roland, Could you check Analysis tab ( https://lucene.apache.org/solr/guide/8_1/analysis-screen.html) and tell how the term is analyzed for both query and index? Kind Regards, Furkan KAMACI On Tue, Jul 30, 2019 at 4:50 PM Szűcs Roland wrote: > Hi All, > > I have an author suggester (searchcomponent and the related request > handler) defined in solrconfig: > > > > > author > AnalyzingInfixLookupFactory > DocumentDictionaryFactory > BOOK_productAuthor > short_text_hu > suggester_infix_author > false > false > 2 > > > > startup="lazy" > > > true > 10 > author > > > suggest > > > > Author field has just a minimal text processing in query and index time > based on the following definition: > positionIncrementGap="100" multiValued="true"> > > > >ignoreCase="true"/> > > > > >ignoreCase="true"/> > > > >docValues="true"/> >docValues="true" multiValued="true"/> >positionIncrementGap="100"> > > > >ignoreCase="true"/> > > > > > > When I use qeries with only ASCII characters, the results are correct: > "Al":{ > "term":"Alexandre Dumas", "weight":0, "payload":""} > > When I try it with Hungarian authorname with special character: > "Jó":"author":{ > "Jó":{ "numFound":0, "suggestions":[]}} > > When I try it with three letters, it works again: > "Józ":"author":{ > "Józ":{ "numFound":10, "suggestions":[{ "term":"Bajza József", " > weight":0, "payload":""}, { "term":"Eötvös József", "weight":0, " > payload":""}, { "term":"Eötvös József", "weight":0, "payload":""}, { > "term":"Eötvös József", "weight":0, "payload":""}, { > "term":"József > Attila", "weight":0, "payload":""}.. > > Any idea how can it happen that a longer string has more matches than a > shorter one. It is inconsistent. What can I do to fix it as it would > results poor customer experience. > They would feel that sometimes they need 2 sometimes 3 characters to get > suggestions. > > Thanks in advance, > Roland >
Re: More Highlighting details
Hi Govind, Highlighting is the easiest way to detect it. You can find a similar question at here: https://stackoverflow.com/questions/9629147/how-to-return-column-that-matched-the-query-in-solr Kind Regards, Furkan KAMACI On Wed, Jul 24, 2019 at 9:28 PM govind nitk wrote: > Hi Furkan KAMACI, > > Thanks for your thoughts on maxAnalyzedChars. > > So, how can we get whether its matched or not? Is there any way to get such > data from extra payload in response from solr ? > > Thanks and regards > Govind > > On Wed, Jul 24, 2019 at 8:43 PM Furkan KAMACI > wrote: > > > Hi Govind, > > > > Using *hl.tag.pre* and *hl.tag.post* may help you. However you should > keep > > in mind that even such term exists in desired field, highlighter can use > > fallback field due to *hl.maxAnalyzedChars* parameter. > > > > Kind Regards, > > Furkan KAMACI > > > > On Wed, Jul 24, 2019 at 8:24 AM govind nitk > wrote: > > > > > Hi all, > > > How about using hl.tag pre and post. If these are present then there is > > > actually field match otherwise its default summary ? > > > Will it work or there are some cases where it will not ? > > > > > > > > > Thanks in advance. > > > > > > > > > > > > On Tue, Jul 23, 2019 at 5:48 PM govind nitk > > wrote: > > > > > > > Hi all, > > > > > > > > How to get more details for highlighting ? > > > > > > > > I am using > > > > > > > > > > hl.method=unified&&hl.fl=title,url,paragraph&hl.requireFieldMatch=true&hl.defaultSummary=true > > > > > > > > So, if query words not matched, I am getting defaultSummary, which is > > > > great. *Can we get more info saying whether it found matches or > default > > > > summary? How to get such information?* > > > > Also is it good idea to use highlighting on urls ? Will urls get > > > distorted > > > > by any chance ? > > > > > > > > > > > > Best Regards, > > > > Govind > > > > > > > > > > > > > >
Re: SOLR Atomic Update - String multiValued Field
Hi Doss, What was existing value and what happens after you do atomic update? Kind Regards, Furkan KAMACI On Wed, Jul 24, 2019 at 2:47 PM Doss wrote: > HI, > > I have a multiValued field of type String. > > multiValued="true"/> > > I want to keep this list unique, so I am using atomic updates with > "add-distinct" > > {"docid":123456,"namelist":{"add-distinct":["Adam","Jane"]}} > > but this is not maintaining the expected uniqueness, am I doing something > wrong? Guide me please. > > Thanks, > Doss. >
Re: More Highlighting details
Hi Govind, Using *hl.tag.pre* and *hl.tag.post* may help you. However you should keep in mind that even such term exists in desired field, highlighter can use fallback field due to *hl.maxAnalyzedChars* parameter. Kind Regards, Furkan KAMACI On Wed, Jul 24, 2019 at 8:24 AM govind nitk wrote: > Hi all, > How about using hl.tag pre and post. If these are present then there is > actually field match otherwise its default summary ? > Will it work or there are some cases where it will not ? > > > Thanks in advance. > > > > On Tue, Jul 23, 2019 at 5:48 PM govind nitk wrote: > > > Hi all, > > > > How to get more details for highlighting ? > > > > I am using > > > hl.method=unified&&hl.fl=title,url,paragraph&hl.requireFieldMatch=true&hl.defaultSummary=true > > > > So, if query words not matched, I am getting defaultSummary, which is > > great. *Can we get more info saying whether it found matches or default > > summary? How to get such information?* > > Also is it good idea to use highlighting on urls ? Will urls get > distorted > > by any chance ? > > > > > > Best Regards, > > Govind > > > > >
Re: Basic Query Not Working - Please Help
Hi Vipul, Which query do you submit? Is that one: q=appname:App1 Kind Regards, Furkan KAMACI On Mon, Jul 22, 2019 at 10:52 AM Vipul Bahuguna < newthings4learn...@gmail.com> wrote: > Hi, > > I have installed SOLR 8.1.1. > I am new and trying the very basics. > > I installed solr8.1.1 on Windows and I am using SOLR in standalone mode. > > Steps I followed - > > 1. created a core as follows: > solr create_core -c dox > > 2. updated the managed_schema.xml file to add few specific fields specific > to my schema as belows: > > > > > stored="true"/> > > 3. then i restarted SOLR > > 4. then i went to the Documents tab to enter my sample data for indexing, > which looks like below: > { > > "id" : "1", > "prjname" : "Project1", > "apps" : [ > { > "appname" : "App1", > "topics" : [ > { > "topicname" : "topic1", > "links" : [ > "http://www.google.com";, > "http://www.t6.com"; > ] > }, > { > "topicname" : "topic2", > "links" : [ > "http://www.java.com";, > "http://www.rediff.com"; > ] > } > ] > }, > { > "appname" : "App2", > "topics" : [ > { > "topicname" : "topic3", > "links" : [ > "http://www.t3.com";, > "http://www.t4.com"; > ] > }, > { > "topicname" : "topic4", > "links" : [ > "http://www.rules.com";, > "http://www.amazon.com"; > ] > } > ] > } > ] > } > > 5. Now when i go to Query tab and click Execute Search with *.*, it shows > my recently added document as follows: > { > "responseHeader":{ "status":0, "QTime":0, "params":{ "q":"*:*", "_": > "1563780352100"}}, "response":{"numFound":1,"start":0,"docs":[ { "id":"1", > " > prjname":["Project1"], "apps":["{appname=App1, topics=[{topicname=topic1, > links=[http://www.google.com, http://www.t6.com]}, {topicname=topic2, > links=[http://www.java.com, http://www.rediff.com]}]}";, "{appname=App2, > topics=[{topicname=topic3, links=[http://www.t3.com, http://www.t4.com]}, > {topicname=topic4, links=[http://www.rules.com, http://www.amazon.com > ]}]}"], > "_version_":1639742305772503040}] }} > > 6. But now when I am trying to search based on field topicname or prjname, > it does not returns any document. Even if put anything in q like App1, zero > results are being returned. > > > Can someone help me understanding what I might have done incorrectly? > May be I defined my schema incorrectly. > > Thanks in advance >
Alternate Fields for Unified Highlighter
Hi All, I want to switch to Unified Highlighter due to performance reasons for my Solr 7.6 I was using these fields solrQuery.addHighlightField("content_*") .set("f.content_en.hl.alternateField", "content") .set("f.content_es.hl.alternateField", "content") .set("hl.useFastVectorHighlighter", "true"); .set("hl.maxAlternateFieldLength", 300); As far as I see, there is no definitions for alternate fields for unified highlighter. How can I configure such a configuration? Kind Regards, Furkan KAMACI
Solr URI Too Long
Hi, I got a URI Too Long error and try to fix it. I'm aware of this conversation: http://lucene.472066.n3.nabble.com/URI-is-too-long-td4254270.html I've tried: Used POST instead of GET at SolrJ Set 2147483647 at solrconfig.xml for each cores. Defined SOLR_OPTS="$SOLR_OPTS -Dorg.eclipse.jetty.server.Request.maxFormContentSize=200" at solr.in.sh I need to send a long query into Solr. I use Solr 7.6.0 and plan to use 8.1 whenever available. Any ideas about how to overcome this? Kind Regards, Furkan KAMACI
JSON Facet Count All Information
Hi, I have a multivalued field at which I store some metadata. I want to see top 4 metadata at my documents and also total metadata count. I run that query: q=metadata:[*+TO+*]&rows=0&json.facet={top_tags:{type:terms,field:metadata,limit:4,mincount:1}} However, how can I calculate total term count in a multivalued field beside running a json facet on that? Kind Regards, Furkan KAMACI
Fetching All Terms and Corresponding Documents
Hi, I need to iterate on all terms at Solr index, and then find related documents for some terms that match my criteria. I know that I can send a query to *LukeRequestHandler*: */admin/luke?fl=content&numTerms={distinct term count}&wt=json* and then check my criteria. If matches, I can send a *fq* to retrieve related docs. However, is there any other efficient way (via rest or Solrj) for my case? Kind Regards, Furkan KAMACI
Re: No registered leader was found after waiting for 1000ms in solr
Hi Maimuna, Could you check here: https://stackoverflow.com/questions/47868737/solr-cloud-no-registered-leader-was-found-after-waiting-for-4000ms Kind Regards, Furkan KAMACI On Wed, Mar 6, 2019 at 10:25 AM maimuna ambareen wrote: > when i run the healthcheck command in solr : > bin/solr healthcheck -c mypet -z x.x.x.x:2181 i am getting > No registered leader was found after waiting for 1000ms > > However i am able to find other details and list of live nodes as well in > the output. Can someone explain me the reason behind this error ? > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: SolrCloud one server with high load
Hi Gaël, Does all three servers have same specifications? On the other hand, is your load balancing configuration for Varnish is round-robin? Kind Regards, Furkan KAMACI On Mon, Mar 4, 2019 at 3:18 PM Gael Jourdan-Weil < gael.jourdan-w...@kelkoogroup.com> wrote: > Hello, > > I come again to the community for some ideas regarding a performance issue > we are having. > > We have a SolrCloud cluster of 3 servers. > Each server hosts 1 replica of 2 collections. > There is no sharding, every server hosts the whole collection. > > Requests are evenly distributed by a Varnish system. > > During some peaks of requests, we see one server of the cluster having > very high load while the two others are totally fine. > The server experiencing this high load is always the same until we reboot > it and the behavior moves to another server. > The server experiencing the issue is not necessarily the leader. > All servers receive the same number of requests per seconds. > > Load data: > - Server1: 5% CPU when low QPS, 90% CPU when high QPS (this one having > issues) > - Server2: 5% CPU when low QPS, 25% CPU when high QPS > - Server3: 5% CPU when low QPS, 20% CPU when high QPS > > What could explain this behavior in SolrCloud mechanisms? > > Thank you for reading, > > Gaël Jourdan-Weil >
Re: Full import alternatives
Hi Sami, Did you check delta import documentation: https://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command Kind Regards, Furkan KAMACI On Thu, Feb 28, 2019 at 7:24 PM sami wrote: > Hi Shawan, can you please suggest a small program or atleast a backbone of > a > program which can give me hints how exactly to achieve, I quote: "I send a > full-import DIH command to all of the > shards, and each one makes an SQL query to MySQL, all of them running in > parallel. " > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: Code review for SOLR related changes.
Hi Fiz, Could you elaborate your question? Kind Regards, Furkan KAMACI On Fri, Mar 1, 2019 at 7:41 PM Fiz Ahmed wrote: > Hi Solr Experts, > > Can you please suggest Code review techniques for SOLR related changes in a > Project. > > > Thanks > FIZ > AML Team. >
Re: Spring Boot Solr+ Kerberos+ Ambari
Hi, You can also check here: https://community.hortonworks.com/articles/15159/securing-solr-collections-with-ranger-kerberos.html On the other hand, we have a section for Solr Kerberos at documentation: https://lucene.apache.org/solr/guide/6_6/kerberos-authentication-plugin.html For any Ambari specific questions, you can ask them at this forum: https://community.hortonworks.com/topics/forum.html Kind Regards, Furkan KAMACI On Thu, Feb 21, 2019 at 1:43 PM Rushikesh Garadade < rushikeshgarad...@gmail.com> wrote: > Hi Furkan, > I think the link you provided is for ranger audit setting, please correct > me if wrong? > > I use HDP 2.6.5. which has Solr 5.6 > > Thank you, > Rushikesh Garadade > > > On Thu, Feb 21, 2019, 2:57 PM Furkan KAMACI > wrote: > > > Hi Rushikesh, > > > > Did you check here: > > > > > https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_security/content/solr_ranger_configure_solrcloud_kerberos.html > > > > By the way, which versions do you use? > > > > Kind Regards, > > Furkan KAMACI > > > > On Thu, Feb 21, 2019 at 11:41 AM Rushikesh Garadade < > > rushikeshgarad...@gmail.com> wrote: > > > > > Hi All, > > > > > > I am trying to set Kerberos for Solr which is installed on Hortonworks > > > Ambari. > > > > > > Q1. Is Ranger a mandatory component for Solr Kerberos configuration on > > > ambari.? > > > > > > I am getting little confused with documents available on internet for > > this. > > > I tried to do without ranger but not getting any success. > > > > > > If is there any good document for the same, please let me know. > > > > > > Thanks, > > > Rushikesh Garadade. > > > > > >
Re: [lucene > nori ] special characters issue
Hi, Could you give some more information about your configuration? Also, check here for how to debug the reason: https://lucene.apache.org/solr/guide/7_6/analysis-screen.html Kind Regards, Furkan KAMACI On Tue, Feb 12, 2019 at 11:34 AM 유정인 wrote: > > Hi I'm using the "nori" analyzer. > > Whether it's an error or an intentional question. > > All special characters are filtered. > > Special characters stored in the dictionary are also filtered. > > How do I print special characters? > >
Re: Spring Boot Solr+ Kerberos+ Ambari
Hi Rushikesh, Did you check here: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_security/content/solr_ranger_configure_solrcloud_kerberos.html By the way, which versions do you use? Kind Regards, Furkan KAMACI On Thu, Feb 21, 2019 at 11:41 AM Rushikesh Garadade < rushikeshgarad...@gmail.com> wrote: > Hi All, > > I am trying to set Kerberos for Solr which is installed on Hortonworks > Ambari. > > Q1. Is Ranger a mandatory component for Solr Kerberos configuration on > ambari.? > > I am getting little confused with documents available on internet for this. > I tried to do without ranger but not getting any success. > > If is there any good document for the same, please let me know. > > Thanks, > Rushikesh Garadade. >
Re: Is anyone using proxy caching in front of solr?
Hi Joakim, I suggest you to read these resources: http://lucene.472066.n3.nabble.com/Varnish-td4072057.html http://lucene.472066.n3.nabble.com/SolrJ-HTTP-caching-td490063.html https://wiki.apache.org/solr/SolrAndHTTPCaches which gives information about HTTP Caching including Varnish Cache, Last-Modified, ETag, Expires, Cache-Control headers. Kind Regards, Furkan KAMACI On Wed, Feb 20, 2019 at 11:18 PM Joakim Hansson wrote: > Hello dear user list! > I work at a company in retail where we use solr to perform searches as you > type. > As soon as you type more than 1 characters in the search field solr starts > serving hits. > Of course this generates a lot of "unnecessary" queries (in the sense that > they are never shown to the user) which is why I started thinking about > using something like squid or varnish to cache a bunch of these 2-4 > character queries. > > It seems most stuff I find about it is from pretty old sources, but as far > as I know solrcloud doesn't have distributed cache support. > > Our indexes aren't updated that frequently, about 4 - 6 times a day. We > don't use a lot of shards and replicas (biggest index is split to 3 shards > with 2 replicas). All shards/replicas are not on the same solr host. > Our solr setup handles around 80-200 queries per second during the day with > peaks at >1500 before holiday season and sales. > > I haven't really read up on the details yet but it seems like I could use > etags and Expires headers to work around having to do some of that > "unnecessary" work. > > Is anyone doing this? Why? Why not? > > - peace! >
Re: English Analyzer
Hi, As Walter suggested you can check it via analyses page. You can find more information here: https://lucene.apache.org/solr/guide/7_6/analysis-screen.html Kind Regards, Furkan KAMACI On Tue, Feb 5, 2019 at 8:51 PM Walter Underwood wrote: > Why? > > If you want to look at the results, install Solr, create a two fieldtypes > in the schema with the two analyzers, then use the analysis page to try > them. > > On the other hand, you could just use KStem. The Porter stemmers are > ancient technology and have some well-known limitations. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Feb 5, 2019, at 9:13 AM, akash jayaweera > wrote: > > > > Thank you very much for the valuable information. > > But I need to use org.apache.lucene.analysis *API* for English and do > the > > analysis process. > > Ex :- when I submit English document to the API, I want the Analyzed > > Document. So I can see the difference made by the English analyzer. > > > > Regards, > > *Akash Jayaweera.* > > > > > > E akash.jayawe...@gmail.com > > M + 94 77 2472635 <+94%2077%20247%202635> > > > > > > On Tue, Feb 5, 2019 at 5:54 PM Dave > wrote: > > > >> This will tell you pretty everything you need to get started > >> > >> https://lucene.apache.org/solr/guide/6_6/language-analysis.html > >> > >>> On Feb 5, 2019, at 4:55 AM, akash jayaweera > > >> wrote: > >>> > >>> Hello All, > >>> > >>> Can i get details how to use English analyzer with stemming, > >>> lemmatizatiion, stopword removal techniques. > >>> I want to see the difference between before and after applying the > >> English > >>> analyzer > >>> > >>> Regards, > >>> *Akash Jayaweera.* > >>> > >>> > >>> E akash.jayawe...@gmail.com > >>> M + 94 77 2472635 <+94%2077%20247%202635> > >> > >
Java Advanced Imaging (JAI) Image I/O Tools are not installed
Hi All, I use Solr 6.5.0 and test OCR capabilities. It OCRs pdf files even it is so slow. However, I see that error when I check logs: o.a.p.c.PDFStreamEngine Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed Any idea how to fix this? Kind Regards, Furkan KAMACI
Rename of Category.QUERYHANDLER
Hi, Solr 6.3.0 had SolrInfoMBean.Category.QUERYHANDLER. However, I cannot see it at Solr 6.5.0. What is the new name of that variable? Kind Regards, Furkan KAMACI
Solr OCR Support
Hi All, I want to index images and pdf documents which have images into Solr. I test it with my Solr 6.3.0. I've installed tesseract at my computer (Mac). I verify that Tesseract works fine to extract text from an image. I index image into Solr but it has no content. However, as far as I know, I don't need to do anything else to integrate Tesseract with Solr. I've checked these but they were not useful for me: http://lucene.472066.n3.nabble.com/TIKA-OCR-not-working-td4201834.html http://lucene.472066.n3.nabble.com/Fwd-configuring-Solr-with-Tesseract-td4361908.html My question is, how can I support OCR with Solr? Kind Regards, Furkan KAMACI
Re: Update Request Processors are Not Chained
I found the problem :) Problem is processor are not combined into one chain. On Thu, Oct 4, 2018 at 3:57 PM Furkan KAMACI wrote: > I've defined my update processors as: > > > class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory"> > > content > en,tr > language_code > other > true > true > > > > > > > > > true > signature > false > content > 3 > name="signatureClass">org.apache.solr.update.processor.TextProfileSignature > > > > > > default="true"> > > 200 > > > > > > > My /update/extract request handler is as follows: > > startup="lazy" > class="solr.extraction.ExtractingRequestHandler" > > > true > true > ignored_ > content > ignored_ > ignored_ > > > dedupe > langid > ignore-commit-from-client > > > > dedupe chain works nd signature field is populated but langid processor is > not triggered at this combination. When I change their places: > > startup="lazy" > class="solr.extraction.ExtractingRequestHandler" > > > true > true > ignored_ > content > ignored_ > ignored_ > > > langid > dedupe > ignore-commit-from-client > > > > langid works but dedup is not activated (signature field is disappears). > > I use Solr 6.3. How can I solve this problem? > > Kind Regards, > Furkan KAMACI >
Update Request Processors are Not Chained
I've defined my update processors as: content en,tr language_code other true true true signature false content 3 org.apache.solr.update.processor.TextProfileSignature 200 My /update/extract request handler is as follows: true true ignored_ content ignored_ ignored_ dedupe langid ignore-commit-from-client dedupe chain works nd signature field is populated but langid processor is not triggered at this combination. When I change their places: true true ignored_ content ignored_ ignored_ langid dedupe ignore-commit-from-client langid works but dedup is not activated (signature field is disappears). I use Solr 6.3. How can I solve this problem? Kind Regards, Furkan KAMACI
Re: Solr 6.6 LanguageDetector
Here is my schema configuration: On Wed, Oct 3, 2018 at 10:50 AM Furkan KAMACI wrote: > Hi, > > I use Solr 6.6 and try to test automatic language detection. I've added > these configuration into my solrconfig.xml. > > > class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory"> > > content > en,tr > language_code > other > true > true > > > > > > ... > startup="lazy" > class="solr.extraction.ExtractingRequestHandler" > > > true > true > ignored_ > content > ignored_ > ignored_ > > > dedupe > langid > ignore-commit-from-client > > > > content field is populated but content_en, content_tr, content_other and > language_code fields are empty. > > What I miss? > > Kind Regards, > Furkan KAMACI >
Solr 6.6 LanguageDetector
Hi, I use Solr 6.6 and try to test automatic language detection. I've added these configuration into my solrconfig.xml. content en,tr language_code other true true ... true true ignored_ content ignored_ ignored_ dedupe langid ignore-commit-from-client content field is populated but content_en, content_tr, content_other and language_code fields are empty. What I miss? Kind Regards, Furkan KAMACI
Re: Java 9
Hi, Here is an explanation about deprecation of https://docs.oracle.com/javase/9/gctuning/concurrent-mark-sweep-cms-collector.htm Kind Regards, Furkan KAMACI On Tue, Nov 7, 2017 at 10:46 AM, Daniel Collins wrote: > Oh, blimey, have Oracle gone with Ubuntu-style numbering now? :) > > On 7 November 2017 at 08:27, Markus Jelsma > wrote: > > > Shawn, > > > > There won't be a Java 10, we'll get Java 18.3 instead. After 9 it is a > > guess when CMS and friends are gone. > > > > Regards, > > Markus > > > > > > > > -Original message- > > > From:Shawn Heisey > > > Sent: Tuesday 7th November 2017 0:24 > > > To: solr-user@lucene.apache.org > > > Subject: Re: Java 9 > > > > > > On 11/6/2017 3:07 PM, Petersen, Robert (Contr) wrote: > > > > Anyone else been noticing this this msg when starting up solr with > > java 9? (This is just an FYI and not a real question) > > > > > > > > Java HotSpot(TM) 64-Bit Server VM warning: Option UseConcMarkSweepGC > > was deprecated in version 9.0 and will likely be removed in a future > > release. > > > > Java HotSpot(TM) 64-Bit Server VM warning: Option UseParNewGC was > > deprecated in version 9.0 and will likely be removed in a future release. > > > > > > I have not tried Java 9 yet. > > > > > > Looks like G1 is now the default garbage collector. I did not know > that > > > they were deprecating CMS and ParNew ... that's a little surprising. > > > Solr's default garbage collection tuning uses those two collectors. It > > > is likely that those choices will be available in all versions of Java > > > 9. It would be very uncharacteristic for Oracle to take action on > > > removing them until version 10, possibly later. > > > > > > If it were solely up to me, I would adjust Solr's startup script to use > > > the G1 collector by default, eliminating the warnings you're seeing. > > > It's not just up to me though. Lucene documentation says to NEVER use > > > the G1 collector because they believe it to be unpredictable and have > > > the potential to cause problems. I personally have never had any > issues > > > with it. There is *one* Lucene issue mentioning problems with G1GC, > and > > > that issue is *specific* to the 32-bit JVM, which is not recommended > > > because of the limited amount of memory it can use. > > > > > > My experiments with GC tuning show the G1 collector (now default in > Java > > > 9) to have very good characteristics with Solr. I have a personal page > > > on the Solr wiki that covers those experiments. > > > > > > https://wiki.apache.org/solr/ShawnHeisey > > > > > > Thanks, > > > Shawn > > > > > > > > >
Re: solr cloud updatehandler stats mismatch
Hi Wei, Do you compare it with files which are under /var/solr/logs by default? Kind Regards, Furkan KAMACI On Sun, Nov 5, 2017 at 6:59 PM, Wei wrote: > Hi, > > I use the following api to track the number of update requests: > > /solr/collection1/admin/mbeans?cat=UPDATE&stats=true&wt=json > > > Result: > > >- class: "org.apache.solr.handler.UpdateRequestHandler", >- version: "6.4.2.1", >- description: "Add documents using XML (with XSLT), CSV, JSON, or >javabin", >- src: null, >- stats: >{ > - handlerStart: 1509824945436, > - requests: 106062, > - ... > > > I am quite confused that the number of requests reported above is quite > different from the count from solr access logs. A few times the handler > stats is much higher: handler reports ~100k requests but in the access log > there are only 5k update requests. What could be the possible cause? > > Thanks, > Wei >
Re: SolrJ Java API examples
Hi Vishal, You can also check here: https://lucene.apache.org/solr/guide/6_6/using-solrj.html#using-solrj You can get enough information about how to use it. Kind Regards, Furkan KAMACI On Thu, Sep 14, 2017 at 1:25 PM, Leonardo Perez Pulido < leoperezpul...@gmail.com> wrote: > Hi, > This may help: > > https://github.com/leoperezpulido/lucene-solr/tree/master/solr/solrj/src/ > test/org/apache/solr/client/solrj > > Regards. > > On Thu, Sep 14, 2017 at 4:21 AM, Vishal Srivastava < > vishal.smu@gmail.com > > wrote: > > > Hi, > > I'm a beginner at SolrJ , and am currently looking to implement and > > integrate the same at my current organisation using Java . > > > > After a lot of research, I failed to find any good material / examples > for > > SolrJ 's Java library that I could use as reference. > > > > Please suggest some good material. > > > > Thanks a ton. > > > > Vishal Srivastava. > > >
Re: Solr Spatial Index and Data
Hi Can, For your first question: you should share more information with us as Rick indicated. Do you have any errors, do you have unique ids or not etc? For the second one: you should read here: https://cwiki.apache.org/confluence/display/solr/Spatial+Search and ask your questions if you have any. Kind Regards, Furkan KAMACI On Thu, Sep 14, 2017 at 1:34 PM, Rick Leir wrote: > hi Can Ezgi > > First of all, i want to use spatial index for my data include polyghons > and points. But solr indexed first 18 rows, other rows not indexed. > > Do all rows have a unique id field? > > Are there errors in the logfile? > cheers -- Rick > > > . >
Re: index new discovered fileds of different types
Hi Thaer, Do you use schemeless mode [1] ? Kind Regards, Furkan KAMACI [1] https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode On Wed, Jul 5, 2017 at 4:23 PM, Thaer Sammar wrote: > Hi, > We are trying to index documents of different types. Document have > different fields. fields are known at indexing time. We run a query on a > database and we index what comes using query variables as field names in > solr. Our current solution: we use dynamic fields with prefix, for example > feature_i_*, the issue with that > 1) we need to define the type of the dynamic field and to be able to cover > the type of discovered fields we define the following > feature_i_* for integers, feature_t_* for string, feature_d_* for double, > > 1.a) this means we need to check the type of the discovered field and then > put in the corresponding dynamic field > 2) at search time, we need to know the right prefix > We are looking for help to find away to ignore the prefix and check of the > type > > regards, > Thaer
Re: Automatically Restart Solr
Hi Jeck, Here is the documentation about how you can run Solr as service: https://lucene.apache.org/solr/guide/6_6/taking-solr-to-production.html However, as far as I see you use Windows as operating system. There is currently an open issue for creating scripts to run as a Windows Service: https://issues.apache.org/jira/browse/SOLR-7105 but not yet completed. Could you check this: http://coding-art.blogspot.com.tr/2016/07/running-solr-61-as-windows-service.html Kind Regards, Furkan KAMACI On Sun, Jul 2, 2017 at 6:12 PM, rojerick luna wrote: > Hi, > > Anyone who successfully set this up? Thanks > > Best Regards, > Jeck > > > On 20 Jun 2017, at 7:10 PM, rojerick luna > wrote: > > > > Hi, > > > > I'm trying to automate Solr restart every week. > > > > I created a stop.bat and updated the start.bat which I found on an > article online. Using stop.bat and start.bat is working fine. However when > I created a Task Scheduler (Windows Scheduler) and setup the frequency to > stop and start (using the bat files), it's not working; the Solr app didn't > restart. > > > > Please let me know if you have successfully tried it and send me steps > how you've setup the Task Scheduler. > > > > Best Regards, > > Jeck Luna > >
SSN Regex Search
Hi, How can I search for SSN regex pattern which overwhelms special dash character issue? As you know that /[0-9]{3}-[0-9]{2}-[0-9]{4}/ will not work as intended. Kind Regards, Furkan KAMACI
Solr SQL Subquery Support
Hi, Does Solr SQL supports subqueries? Kind Regards, Furkan KAMACI
Re: Inconsistent Counts in Cloud at Solr SQL Queries
Thanks for the answer! Does facet uses Solr Json requests or new facet API (which is faster than the old one)? On Mon, Apr 24, 2017 at 2:18 PM, Joel Bernstein wrote: > SQL has two aggregation modes: facet and map_reduce. Facet uses the json > facet API directly so SOLR-7452 would apply if it hasn't been resolved yet. > map_reduce always gives accurate results regardless of the cardinality but > is slower. To increase performance using map_reduce you need to increase > the size of the cluster (workers, shards, replicas). > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Mon, Apr 24, 2017 at 5:09 AM, Furkan KAMACI > wrote: > > > Hi, > > > > As you know that json facet api returns inconsistent counts in cloud set > up > > (SOLR-7452). I would like to learn that is the situation same for Solr > SQL > > queries too? > > > > Kind Regards, > > Furkan KAMACI > > >
Inconsistent Counts in Cloud at Solr SQL Queries
Hi, As you know that json facet api returns inconsistent counts in cloud set up (SOLR-7452). I would like to learn that is the situation same for Solr SQL queries too? Kind Regards, Furkan KAMACI
Re: Solr Stream Content from URL
Hi Alexandre, My content is protected via Basic Authentication. Is it possible to use Basic Authentication with Solr Content Streams? Kind Regards, Furkan KAMACI On Wed, Apr 19, 2017 at 9:13 PM, Alexandre Rafalovitch wrote: > Have you tried stream.url parameter after enabling the > enableRemoteStreaming flag? > https://cwiki.apache.org/confluence/display/solr/Content+Streams > > Regards, >Alex. > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 19 April 2017 at 13:27, Furkan KAMACI wrote: > > Hi, > > > > Is it possible to stream a CSV content from URL to Solr? > > > > I've tried URLDataSource but could not figure out about what to use as > > document. > > > > Kind Regards, > > Furkan KAMACI >
Solr Stream Content from URL
Hi, Is it possible to stream a CSV content from URL to Solr? I've tried URLDataSource but could not figure out about what to use as document. Kind Regards, Furkan KAMACI
Re: Filter Facet Query
Hi Alex, I found the reason, thanks for the help. Facet shows all possible values including 0. Could you help on my last question: I have facet results like: "", 9 "research",6 "development",3 I want to filter empty string from my facet "" (I don't want to add it to fq, just filter from facets). How can I do that? On Tue, Apr 18, 2017 at 11:52 AM, Alexandre Rafalovitch wrote: > Are you saying that all the values in the facet are zero with that > query? The query you gave seems to be the super-basic faceting code, > so maybe something super-basic is missing. > > E.g. > *) Did you check that the documents you get back actually have any > values in that field to facet on? > *) Did you try making a query just by ID for a document that > definitely has the value in that field? > *) Did you do the query with echoParams=all to see that you are not > having any hidden extra parameters that get appended? > > Regards, >Alex. > > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 18 April 2017 at 11:43, Furkan KAMACI wrote: > > OK, it returns 0 results every time. > > > > So, > > > > I want to filter out research values with empty string ("") from facet > > result. How can I do that? > > > > > > On Tue, Apr 18, 2017 at 8:53 AM, Furkan KAMACI > > wrote: > > > >> First problem is they do not match with main query. > >> > >> 18 Nis 2017 Sal, saat 01:54 tarihinde Dave < > hastings.recurs...@gmail.com> > >> şunu yazdı: > >> > >>> Min.count is what you're looking for to get non 0 facets > >>> > >>> > On Apr 17, 2017, at 6:51 PM, Furkan KAMACI > >>> wrote: > >>> > > >>> > My query: > >>> > > >>> > /select?facet.field=research&facet=on&q=content:test > >>> > > >>> > Q1) Facet returns research values with 0 counts which has a research > >>> value > >>> > that is not from a document matched by main query (content:test). Is > >>> that > >>> > usual? > >>> > > >>> > Q2) I want to filter out research values with empty string ("") from > >>> facet > >>> > result. How can I do that? > >>> > > >>> > Kind Regards, > >>> > Furkan KAMACI > >>> > >> >
Re: Filter Facet Query
OK, it returns 0 results every time. So, I want to filter out research values with empty string ("") from facet result. How can I do that? On Tue, Apr 18, 2017 at 8:53 AM, Furkan KAMACI wrote: > First problem is they do not match with main query. > > 18 Nis 2017 Sal, saat 01:54 tarihinde Dave > şunu yazdı: > >> Min.count is what you're looking for to get non 0 facets >> >> > On Apr 17, 2017, at 6:51 PM, Furkan KAMACI >> wrote: >> > >> > My query: >> > >> > /select?facet.field=research&facet=on&q=content:test >> > >> > Q1) Facet returns research values with 0 counts which has a research >> value >> > that is not from a document matched by main query (content:test). Is >> that >> > usual? >> > >> > Q2) I want to filter out research values with empty string ("") from >> facet >> > result. How can I do that? >> > >> > Kind Regards, >> > Furkan KAMACI >> >
Re: Filter Facet Query
First problem is they do not match with main query. 18 Nis 2017 Sal, saat 01:54 tarihinde Dave şunu yazdı: > Min.count is what you're looking for to get non 0 facets > > > On Apr 17, 2017, at 6:51 PM, Furkan KAMACI > wrote: > > > > My query: > > > > /select?facet.field=research&facet=on&q=content:test > > > > Q1) Facet returns research values with 0 counts which has a research > value > > that is not from a document matched by main query (content:test). Is that > > usual? > > > > Q2) I want to filter out research values with empty string ("") from > facet > > result. How can I do that? > > > > Kind Regards, > > Furkan KAMACI >
Filter Facet Query
My query: /select?facet.field=research&facet=on&q=content:test Q1) Facet returns research values with 0 counts which has a research value that is not from a document matched by main query (content:test). Is that usual? Q2) I want to filter out research values with empty string ("") from facet result. How can I do that? Kind Regards, Furkan KAMACI
Re: Filter if Field Exists
@Alexandre Rafalovitch, I could define empty string => "" as default value but than I do facet on that field too. I will need to filter empty strings from facet generation logic. By the way, which one is faster: either defining empty string as default value and appending (OR type:"") to queries or negative search clauses? On Mon, Apr 17, 2017 at 2:22 PM, Furkan KAMACI wrote: > On the other hand, that query does not do what I want. > > On Mon, Apr 17, 2017 at 2:18 PM, Furkan KAMACI > wrote: > >> Btw, what is the difference between >> >> +name:test +(type:research (*:* -type:[* TO *])) >> >> and >> >> +name:test +(type:research -type:[* TO *]) >> >> On Mon, Apr 17, 2017 at 1:33 PM, Furkan KAMACI >> wrote: >> >>> Actually, amount of documents which have 'type' field is relatively too >>> small across all documents at index. >>> >>> On Mon, Apr 17, 2017 at 7:08 AM, Alexandre Rafalovitch < >>> arafa...@gmail.com> wrote: >>> >>>> What about setting a default value for the field? That is probably >>>> faster than negative search clauses? >>>> >>>> Regards, >>>>Alex. >>>> >>>> http://www.solr-start.com/ - Resources for Solr users, new and >>>> experienced >>>> >>>> >>>> On 16 April 2017 at 23:58, Mikhail Khludnev wrote: >>>> > +name:test +(type:research (*:* -type:[* TO *])) >>>> > >>>> > On Sun, Apr 16, 2017 at 11:47 PM, Furkan KAMACI < >>>> furkankam...@gmail.com> >>>> > wrote: >>>> > >>>> >> Hi, >>>> >> >>>> >> I have a schema like: >>>> >> >>>> >> name, >>>> >> department, >>>> >> type >>>> >> >>>> >> type is an optional field. Some documents don't have that field. >>>> Let's >>>> >> assume I have these: >>>> >> >>>> >> Doc 1: >>>> >> name: test >>>> >> type: research >>>> >> >>>> >> Doc 2: >>>> >> name: test >>>> >> type: developer >>>> >> >>>> >> Doc 3: >>>> >> name: test >>>> >> >>>> >> I want to search name: test and type:research if type field exists >>>> (result >>>> >> will be Doc 1 and Doc 3). >>>> >> >>>> >> How can I do that? >>>> >> >>>> >> Kind Regards, >>>> >> Furkan KAMACI >>>> >> >>>> > >>>> > >>>> > >>>> > -- >>>> > Sincerely yours >>>> > Mikhail Khludnev >>>> >>> >>> >> >
Re: Filter if Field Exists
On the other hand, that query does not do what I want. On Mon, Apr 17, 2017 at 2:18 PM, Furkan KAMACI wrote: > Btw, what is the difference between > > +name:test +(type:research (*:* -type:[* TO *])) > > and > > +name:test +(type:research -type:[* TO *]) > > On Mon, Apr 17, 2017 at 1:33 PM, Furkan KAMACI > wrote: > >> Actually, amount of documents which have 'type' field is relatively too >> small across all documents at index. >> >> On Mon, Apr 17, 2017 at 7:08 AM, Alexandre Rafalovitch < >> arafa...@gmail.com> wrote: >> >>> What about setting a default value for the field? That is probably >>> faster than negative search clauses? >>> >>> Regards, >>>Alex. >>> >>> http://www.solr-start.com/ - Resources for Solr users, new and >>> experienced >>> >>> >>> On 16 April 2017 at 23:58, Mikhail Khludnev wrote: >>> > +name:test +(type:research (*:* -type:[* TO *])) >>> > >>> > On Sun, Apr 16, 2017 at 11:47 PM, Furkan KAMACI < >>> furkankam...@gmail.com> >>> > wrote: >>> > >>> >> Hi, >>> >> >>> >> I have a schema like: >>> >> >>> >> name, >>> >> department, >>> >> type >>> >> >>> >> type is an optional field. Some documents don't have that field. Let's >>> >> assume I have these: >>> >> >>> >> Doc 1: >>> >> name: test >>> >> type: research >>> >> >>> >> Doc 2: >>> >> name: test >>> >> type: developer >>> >> >>> >> Doc 3: >>> >> name: test >>> >> >>> >> I want to search name: test and type:research if type field exists >>> (result >>> >> will be Doc 1 and Doc 3). >>> >> >>> >> How can I do that? >>> >> >>> >> Kind Regards, >>> >> Furkan KAMACI >>> >> >>> > >>> > >>> > >>> > -- >>> > Sincerely yours >>> > Mikhail Khludnev >>> >> >> >
Re: Filter if Field Exists
Btw, what is the difference between +name:test +(type:research (*:* -type:[* TO *])) and +name:test +(type:research -type:[* TO *]) On Mon, Apr 17, 2017 at 1:33 PM, Furkan KAMACI wrote: > Actually, amount of documents which have 'type' field is relatively too > small across all documents at index. > > On Mon, Apr 17, 2017 at 7:08 AM, Alexandre Rafalovitch > wrote: > >> What about setting a default value for the field? That is probably >> faster than negative search clauses? >> >> Regards, >>Alex. >> >> http://www.solr-start.com/ - Resources for Solr users, new and >> experienced >> >> >> On 16 April 2017 at 23:58, Mikhail Khludnev wrote: >> > +name:test +(type:research (*:* -type:[* TO *])) >> > >> > On Sun, Apr 16, 2017 at 11:47 PM, Furkan KAMACI > > >> > wrote: >> > >> >> Hi, >> >> >> >> I have a schema like: >> >> >> >> name, >> >> department, >> >> type >> >> >> >> type is an optional field. Some documents don't have that field. Let's >> >> assume I have these: >> >> >> >> Doc 1: >> >> name: test >> >> type: research >> >> >> >> Doc 2: >> >> name: test >> >> type: developer >> >> >> >> Doc 3: >> >> name: test >> >> >> >> I want to search name: test and type:research if type field exists >> (result >> >> will be Doc 1 and Doc 3). >> >> >> >> How can I do that? >> >> >> >> Kind Regards, >> >> Furkan KAMACI >> >> >> > >> > >> > >> > -- >> > Sincerely yours >> > Mikhail Khludnev >> > >
Re: Filter if Field Exists
Actually, amount of documents which have 'type' field is relatively too small across all documents at index. On Mon, Apr 17, 2017 at 7:08 AM, Alexandre Rafalovitch wrote: > What about setting a default value for the field? That is probably > faster than negative search clauses? > > Regards, >Alex. > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 16 April 2017 at 23:58, Mikhail Khludnev wrote: > > +name:test +(type:research (*:* -type:[* TO *])) > > > > On Sun, Apr 16, 2017 at 11:47 PM, Furkan KAMACI > > wrote: > > > >> Hi, > >> > >> I have a schema like: > >> > >> name, > >> department, > >> type > >> > >> type is an optional field. Some documents don't have that field. Let's > >> assume I have these: > >> > >> Doc 1: > >> name: test > >> type: research > >> > >> Doc 2: > >> name: test > >> type: developer > >> > >> Doc 3: > >> name: test > >> > >> I want to search name: test and type:research if type field exists > (result > >> will be Doc 1 and Doc 3). > >> > >> How can I do that? > >> > >> Kind Regards, > >> Furkan KAMACI > >> > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev >
Filter if Field Exists
Hi, I have a schema like: name, department, type type is an optional field. Some documents don't have that field. Let's assume I have these: Doc 1: name: test type: research Doc 2: name: test type: developer Doc 3: name: test I want to search name: test and type:research if type field exists (result will be Doc 1 and Doc 3). How can I do that? Kind Regards, Furkan KAMACI
JSON Facet API Virtual Field Support
Hi, I test JSON Facet API of Solr. Is it possible to create a virtual field which is generated by using existing fields at response and supports elementary arithmetic operations? Example: Schema fields: products, sold_products, date I want to run a date range facet and add another field to response which is the percentage of sold products (ratio will be calculated as sold_products * 100 / products) Kind Regards, Furkan KAMACI
Count Dates Given A Range in a Multivalued Field
Hi All, I have a multivalued date field i.e.: [2017-02-06T00:00:00Z,2017-02-09T00:00:00Z,2017-03-04T00:00:00Z] I want to count how many dates exist given a data range within such field. i.e. start: 2017-02-01T00:00:00Z end: 2017-02-28T00:00:00Z result is 2 (2017-02-06T00:00:00Z and 2017-02-09T00:00:00Z). I want to do it with JSON Facet API. How can I do it?
Re: Managed Schema multiValued Predict Problem
You are right, I mean schemaless mode. I saw that it's your answer ;) I've edited solrconfig.xml and fixed it. Thanks! On Mon, Mar 13, 2017 at 5:46 PM, Alexandre Rafalovitch wrote: > There is managed schema, which means it is editable via API, and there > is 'schemaless' mode that uses that to auto-define the field based on > the first occurance. > > 'schemaless' mode does not know if the field will be multi-valued the > first time it sees content for that field. So, all the fields created > automatically are multivalued. You can change the definition or you > can define the field explicitly using the API or Admin UI. > > 'schemaless' is only there really for a quick prototyping with unknown > content. > > Regards, >Alex. > P.s. That's my SO answer :-) Glad you found it useful. > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 13 March 2017 at 11:15, Furkan KAMACI wrote: > > Hi, > > > > I generate dummy documents to test Solr 6.4.2. I create a field like that > > at my test code: > > > > int customCount = r.nextInt(500); > > document.addField("custom_count", customCount); > > > > This field is indexed as: > > > > org.apache.solr.schema.TrieLongField > > > > and > > > > Multivalued. > > > > I want to use FieldCache on multivalued field and don't want it to be > > multivalued. When I check managed-schema I see that: > > > >> positionIncrementGap="0" docValues="true" precisionStep="0"/> > >> positionIncrementGap="0" docValues="true" multiValued="true" > > precisionStep="0"/> > > > > So, it seems that it's predicted as longs instead of long. > > > > What is the reason behind that? > > > > Kind Regards, > > Furkan KAMACI >
Re: Managed Schema multiValued Predict Problem
OK, I found the answer here: http://stackoverflow.com/questions/38730035/solr-schemaless-mode-creating-fields-as-multivalued On Mon, Mar 13, 2017 at 5:15 PM, Furkan KAMACI wrote: > Hi, > > I generate dummy documents to test Solr 6.4.2. I create a field like that > at my test code: > > int customCount = r.nextInt(500); > document.addField("custom_count", customCount); > > This field is indexed as: > > org.apache.solr.schema.TrieLongField > > and > > Multivalued. > > I want to use FieldCache on multivalued field and don't want it to be > multivalued. When I check managed-schema I see that: > >positionIncrementGap="0" docValues="true" precisionStep="0"/> >positionIncrementGap="0" docValues="true" multiValued="true" > precisionStep="0"/> > > So, it seems that it's predicted as longs instead of long. > > What is the reason behind that? > > Kind Regards, > Furkan KAMACI > >
Managed Schema multiValued Predict Problem
Hi, I generate dummy documents to test Solr 6.4.2. I create a field like that at my test code: int customCount = r.nextInt(500); document.addField("custom_count", customCount); This field is indexed as: org.apache.solr.schema.TrieLongField and Multivalued. I want to use FieldCache on multivalued field and don't want it to be multivalued. When I check managed-schema I see that: So, it seems that it's predicted as longs instead of long. What is the reason behind that? Kind Regards, Furkan KAMACI
Re: Predicting Date Field at Schemaless Mode
Everything works well but type is predicted as String instead of Date. I create just plain documents as follows: SimpleDateFormat simpleDateFormat = new SimpleDateFormat("-MM-dd'T'HH:mm"); Calendar startDate = new GregorianCalendar(2017, r.nextInt(6), r.nextInt(28)); document.addField("custom_start", simpleDateFormat.format(startDate.getTime())); ... solrClient.add(document); ... solrClient.commit(); On Mon, Mar 13, 2017 at 4:44 PM, Alexandre Rafalovitch wrote: > Any other definitions in that URP chain are triggered? > > Are you seeing this in a nested document by any chance? > > Regards, >Alex. > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 13 March 2017 at 10:29, Furkan KAMACI wrote: > > Hi, > > > > I'm testing schemaless mode of Solr 6.4.2. Solr predicts fields types > when > > I generate dummy data and index it to Solr. However I could not make Solr > > to predict date fields. I tried that: > > > > "custom_start":["2017-05-16T00:00"] > > > > which is a date parse result of SimpleDateFormat("-MM-dd'T'HH:mm"); > > > > and > > > > "custom_start":["2017-05-16"] > > > > from SimpleDateFormat("-MM-dd"); > > > > at both scenarios, predicted type is: > > > > org.apache.solr.schema.StrField > > > > I use fresh version of Solr which does not have custom modifications and > > has proper solr.ParseDateFieldUpdateProcessorFactory definition. > > > > What I'm missing? > > > > Kind Regards, > > Furkan KAMACI >
Predicting Date Field at Schemaless Mode
Hi, I'm testing schemaless mode of Solr 6.4.2. Solr predicts fields types when I generate dummy data and index it to Solr. However I could not make Solr to predict date fields. I tried that: "custom_start":["2017-05-16T00:00"] which is a date parse result of SimpleDateFormat("-MM-dd'T'HH:mm"); and "custom_start":["2017-05-16"] from SimpleDateFormat("-MM-dd"); at both scenarios, predicted type is: org.apache.solr.schema.StrField I use fresh version of Solr which does not have custom modifications and has proper solr.ParseDateFieldUpdateProcessorFactory definition. What I'm missing? Kind Regards, Furkan KAMACI
Re: Query Elevation Component as a Managed Resource
Hi Jeffery, I was looking whether an issue is raised for it or not. Thanks for pointing it, I'm planning to create a patch. Kind Regards, Furkan KAMACI On Mon, Jan 9, 2017 at 6:44 AM, Jeffery Yuan wrote: > I am looking for same things. > > Seems Solr doesn't support this. > > Maybe you can vote for https://issues.apache.org/jira/browse/SOLR-6092, so > add a patch for it :) > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/Query-Elevation-Component-as-a-Managed- > Resource-tp4312089p4313034.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Can I use SolrJ 6.3.0 to talk to a Solr 5.2.3 server?
Hi Jennifer, Take a look at index compatibility beside dependencies. Here is the explanation: Index Format Changes Solr 6 has no support for reading Lucene/Solr 4.x and earlier indexes. Be sure to run the Lucene IndexUpgrader included with Solr 5.5 if you might still have old 4x formatted segments in your index. Alternatively: fully optimize your index with Solr 5.5 to make sure it consists only of one up-to-date index segment. You can read more from here: https://cwiki.apache.org/confluence/display/solr/Major+Changes+from+Solr+5+to+Solr+6 Kind Regards, Furkan KAMACI On Tue, Jan 3, 2017 at 7:35 PM, Jennifer Coston < jennifer.cos...@raytheon.com> wrote: > Hello, > > I am running into a conflict with Solr and ElasticSearch. We are trying to > add support for Elastic Search 5.1.1 which requires Lucene 6.3.0 to an > existing system that uses Solr 5.2.3. At the moment I am using SolrJ 5.3.1 > to talk to the 5.2.3 Server. I was hoping I could just update the SolrJ > libraries to 6.3.0 so the Lucene conflict goes away, but when I try to run > my unit tests I'm seeing this error: > > java.util.ServiceConfigurationError: Cannot instantiate SPI class: > org.apache.lucene.codecs.simpletext.SimpleTextPostingsFormat > at org.apache.lucene.util.NamedSPILoader.reload( > NamedSPILoader.java:82) > at org.apache.lucene.codecs.PostingsFormat. > reloadPostingsFormats(PostingsFormat.java:132) > at org.apache.solr.core.SolrResourceLoader. > reloadLuceneSPI(SolrResourceLoader.java:237) > at org.apache.solr.core.SolrResourceLoader.( > SolrResourceLoader.java:182) > at org.apache.solr.core.SolrResourceLoader.( > SolrResourceLoader.java:142) > at org.apache.solr.core.CoreContainer.( > CoreContainer.java:217) > at com.rtn.iaf.catalog.test.SolrAnalyticClientTest. > setUpBeforeClass(SolrAnalyticClientTest.java:59) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.junit.runners.model.FrameworkMethod$1. > runReflectiveCall(FrameworkMethod.java:50) > at org.junit.internal.runners. > model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at org.junit.runners.model.FrameworkMethod. > invokeExplosively(FrameworkMethod.java:47) > at org.junit.internal.runners.statements.RunBefores. > evaluate(RunBefores.java:24) > at org.junit.internal.runners. > statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner. > java:363) > at org.eclipse.jdt.internal.junit4.runner. > JUnit4TestReference.run(JUnit4TestReference.java:86) > at org.eclipse.jdt.internal.junit.runner.TestExecution. > run(TestExecution.java:38) > at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner. > runTests(RemoteTestRunner.java:459) > at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner. > runTests(RemoteTestRunner.java:675) > at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner. > run(RemoteTestRunner.java:382) > at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner. > main(RemoteTestRunner.java:192) > Caused by: java.lang.IllegalAccessException: Class > org.apache.lucene.util.NamedSPILoader > can not access a member of class > org.apache.lucene.codecs.simpletext.SimpleTextPostingsFormat > with modifiers "public" > at sun.reflect.Reflection.ensureMemberAccess(Reflection. > java:102) > at java.lang.Class.newInstance(Class.java:436) > at org.apache.lucene.util.NamedSPILoader.reload( > NamedSPILoader.java:72) > ... 22 more > Is it possible to talk to the 5.2.3 Server using SolrJ 6.3.0? > > Here are the Solr Dependencies I have in my pom.xml: > > > > org.apache.solr > solr-solrj > 6.3.0 > > > > org.apache.solr > solr-core > 6.3.0 > test > > > jdk.tools > jdk.tools > >
Query Elevation Component as a Managed Resource
Hi, Can we access to Query Elevation Component as a Managed Resource? If not, I would like to add that functionality. Kind Regards, Furkan KAMACI
Empty Highlight Problem - Solr 6.3.0
Hi All, I'm trying highlighter component at Solr 6.3. I have a problem when I index PDF files. I know that given keyword exists at result document (it is returned as result because of a hit at document as well), highlighting field is empty at response. I'm suspicious about it happens documents which has large content. How can I solve this problem. I've tried Standard Highlighter and FastVector Highlighter (termVectors, termPositions, and termOffsets are enabled for hl fields) but result is same? Kind Regards, Furkan KAMACI
FuzzyLookupFactory throws FuzzyLookupFactory
Hi, When I try suggester component and use FuzzyLookupFactory I get that error: "error": { "msg": "java.lang.StackOverflowError", "trace": "java.lang.RuntimeException: FuzzyLookupFactory n\tat org.apache.solr.servlet.HttpSolrCall.sendError(HttpSolrCall.java:607)\n\tat I searched on the web and there are some other people who gets that error too. Responses to such questions indicate that it may be usual if there are many data on index. However I just index 4 small PDF files and get that error when I want to construct suggester. Any ideas? Kind Regards, Furkan KAMACI
Limit Suggested Term Counts
I have a list to make suggestions on it. When I check the analyser page I see that field is analysed as I intended. i.e. tokens are: java linux mac However, when I use BlendedInfixLookupFactory to run a suggestion on that field it returns me whole paragraph instead of a limited size of terms (I know that such implementations does return suggestions even desired terms are inside the term, not the beginning). Is it possible to limit that suggested term count? Kind Regards, Furkan KAMACI
Re: Solr Suggester
Hi Emir, As far as I know, it should be enough to be stored=true for a suggestion field? Should it be both indexed and stored? Kind Regards, Furkan KAMACI On Thu, Dec 22, 2016 at 11:31 AM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: > That is because my_field_2 is not indexed. > > Regards, > Emir > > > On 21.12.2016 18:04, Furkan KAMACI wrote: > >> Hi All, >> >> I've a field like that: >> >> > multiValued="false" /> >> >> > stored="true" multiValued="false"/> >> >> When I run a suggester on my_field_1 it returns response. However >> my_field_2 doesn't. I've defined suggester as: >> >>suggester >>FuzzyLookupFactory >>DocumentDictionaryFactory >> >> What can be the reason? >> >> Kind Regards, >> Furkan KAMACI >> >> > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > >
Solr Suggester
Hi All, I've a field like that: When I run a suggester on my_field_1 it returns response. However my_field_2 doesn't. I've defined suggester as: suggester FuzzyLookupFactory DocumentDictionaryFactory What can be the reason? Kind Regards, Furkan KAMACI
Re: Soft commit and reading data just after the commit
Hi Lasitha, First of all, did you check these: https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ after that, if you cannot adjust your configuration you can give more information and we can find a solution. Kind Regards, Furkan KAMACI On Sun, Dec 18, 2016 at 2:28 PM, Lasitha Wattaladeniya wrote: > Hi furkan, > > Thanks for your reply, it is generally a query heavy system. We are using > realtime indexing for editing the available data > > Regards, > Lasitha > > Lasitha Wattaladeniya > Software Engineer > > Mobile : +6593896893 <+65%209389%206893> > Blog : techreadme.blogspot.com > > On Sun, Dec 18, 2016 at 8:12 PM, Furkan KAMACI > wrote: > >> Hi Lasitha, >> >> What is your indexing / querying requirements. Do you have an index >> heavy/light - query heavy/light system? >> >> Kind Regards, >> Furkan KAMACI >> >> On Sun, Dec 18, 2016 at 11:35 AM, Lasitha Wattaladeniya < >> watt...@gmail.com> >> wrote: >> >> > Hello devs, >> > >> > I'm here with another problem i'm facing. I'm trying to do a commit >> (soft >> > commit) through solrj and just after the commit, retrieve the data from >> > solr (requirement is to get updated data list). >> > >> > I'm using soft commit instead of the hard commit, is previously I got an >> > error "Exceeded limit of maxWarmingSearchers=2, try again later" >> because of >> > too many commit requests. Now I have removed the explicit commit and has >> > let the solr to do the commit using autoSoftCommit *(1 mili second)* and >> > autoCommit *(30 seconds)* configurations. Now I'm not getting any errors >> > when i'm committing frequently. >> > >> > The problem i'm facing now is, I'm not getting the updated data when I >> > fetch from solr just after the soft commit. So in this case what are the >> > best practices to use ? to wait 1 mili second before retrieving data >> after >> > soft commit ? I don't feel like waiting from client side is a good >> option. >> > Please give me some help from your expert knowledge >> > >> > Best regards, >> > Lasitha Wattaladeniya >> > Software Engineer >> > >> > Mobile : +6593896893 >> > Blog : techreadme.blogspot.com >> > >> > >
Re: Confusing debug=timing parameter
Hi, Let me explain you *time* *parameters in Solr*: *Timing* parameter of debug returns information about how long the query took to process. *Query time* shows information of how long did it take in Solr to get the search results. It doesn't include reading bits from disk, etc. Also, there is another parameter named as *elapsed time*. It shows time frame of the query sent to Solr and response is returned. Includes query time, reading bits from disk, constructing the response and transmissioning it, etc. Kind Regards, Furkan KAMACI On Sat, Dec 17, 2016 at 6:43 PM, S G wrote: > Hi, > > I am using Solr 4.10 and its response time for the clients is not very > good. > Even though the Solr's plugin/stats shows less than 200 milliseconds, > clients report several seconds in response time. > > So I tried using debug-timing parameter from the Solr UI and this is what I > got. > Note how the QTime is 2978 while the time in debug-timing is 19320. > > What does this mean? > How can Solr return a result in 3 seconds when time taken between two > points in the same path is 20 seconds ? > > { > "responseHeader": { > "status": 0, > "QTime": 2978, > "params": { > "q": "*:*", > "debug": "timing", > "indent": "true", > "wt": "json", > "_": "1481992653008" > } > }, > "response": { > "numFound": 1565135270, > "start": 0, > "maxScore": 1, > "docs": [ > > ] > }, > "debug": { > "timing": { > "time": 19320, > "prepare": { > "time": 4, > "query": { > "time": 3 > }, > "facet": { > "time": 0 > }, > "mlt": { > "time": 0 > }, > "highlight": { > "time": 0 > }, > "stats": { > "time": 0 > }, > "expand": { > "time": 0 > }, > "debug": { > "time": 0 > } > }, > "process": { > "time": 19315, > "query": { > "time": 19309 > }, > "facet": { > "time": 0 > }, > "mlt": { > "time": 1 > }, > "highlight": { > "time": 0 > }, > "stats": { > "time": 0 > }, > "expand": { > "time": 0 > }, > "debug": { > "time": 5 > } > } > } > } > } >
Re: Soft commit and reading data just after the commit
Hi Lasitha, What is your indexing / querying requirements. Do you have an index heavy/light - query heavy/light system? Kind Regards, Furkan KAMACI On Sun, Dec 18, 2016 at 11:35 AM, Lasitha Wattaladeniya wrote: > Hello devs, > > I'm here with another problem i'm facing. I'm trying to do a commit (soft > commit) through solrj and just after the commit, retrieve the data from > solr (requirement is to get updated data list). > > I'm using soft commit instead of the hard commit, is previously I got an > error "Exceeded limit of maxWarmingSearchers=2, try again later" because of > too many commit requests. Now I have removed the explicit commit and has > let the solr to do the commit using autoSoftCommit *(1 mili second)* and > autoCommit *(30 seconds)* configurations. Now I'm not getting any errors > when i'm committing frequently. > > The problem i'm facing now is, I'm not getting the updated data when I > fetch from solr just after the soft commit. So in this case what are the > best practices to use ? to wait 1 mili second before retrieving data after > soft commit ? I don't feel like waiting from client side is a good option. > Please give me some help from your expert knowledge > > Best regards, > Lasitha Wattaladeniya > Software Engineer > > Mobile : +6593896893 > Blog : techreadme.blogspot.com >
Checking Optimal Values for BM25
Hi, Sole's default similarity is BM25 anymore. Its parameters are defined as k1=1.2, b=0.75 as default. However is there any way that to check the effect of using different coefficients to calculate BM25 to find the optimal values? Kind Regards, Furkan KAMACI
Setting Shard Count at Initial Startup of SolrCloud
Hi, I have an external Zookeeper. I don't wanna use SolrCloud as test. I upload confs to Zookeeper: server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181 -cmd upconfig -confdir server/solr/my_collection/conf -confname my_collection Start servers: Server 1: bin/solr start -cloud -d server -p 8983 -z localhost:2181 Server 2: bin/solr start -cloud -d server -p 8984 -z localhost:2181 As usual, shard count will be 1 with this approach. I want 2 shards. I know that I can create shard with: bin/solr create However, I have to delete existing collection and than I can create shards. Is there any possibility to set number of shards and maximum shards per node etc. at initial start of Solr? Kind Regards, Furkan KAMACI
Map Highlight Field into Another Field
Hi, One can use * at highlight fields. As like: content_* So, content_de and content_en can match to it. However response will include such fields: "highlighting":{ "my query":{ "content_de": "content_en": ... Is it possible to map matched fields into a pre defined field. As like: content_* => content So, one can handle a generic name for such cases at response? If not, I can implement such a feature. Kind Regards, Furkan KAMACI
Copying Tokens
Hi, I'm testing language identification. I've enabled it solrconfig.xml. Here is my dynamic fields at schema: So, after indexing, I see that fields are generated: content_en content_ru I copy my fields into a text field: Here is my text field: I want to let users only search on only *text* field. However, when I copy that fields into *text *field, they are indexed according to text_general. How can I copy *tokens* to *text *field? Kind Regards, Furkan KAMACI
Re: Unicode Character Problem
Hi Ahmet, I don't see any weird character when I manual copy it to any text editor. On Sat, Dec 10, 2016 at 6:19 PM, Ahmet Arslan wrote: > Hi Furkan, > > I am pretty sure this is a pdf extraction thing. > Turkish characters caused us trouble in the past during extracting text > from pdf files. > You can confirm by performing manual copy-paste from original pdf file. > > Ahmet > > > On Friday, December 9, 2016 8:44 PM, Furkan KAMACI > wrote: > Hi, > > I'm trying to index Turkish characters. These are what I see at my index (I > see both of them at different places of my content): > > aç �klama > açıklama > > These are same words but indexed different (same weird character at first > one). I see that there is not a weird character when I check the original > PDF file. > > What do you think about it. Is it related to Solr or Tika? > > PS: I use text_general for analyser of content field. > > Kind Regards, > Furkan KAMACI >
Unicode Character Problem
Hi, I'm trying to index Turkish characters. These are what I see at my index (I see both of them at different places of my content): aç �klama açıklama These are same words but indexed different (same weird character at first one). I see that there is not a weird character when I check the original PDF file. What do you think about it. Is it related to Solr or Tika? PS: I use text_general for analyser of content field. Kind Regards, Furkan KAMACI
Re: LukeRequestHandler Error getting file length for [segments_1l]
No OOM, no corrupted index. Just a clean instal with few documents. Similar to this: http://lucene.472066.n3.nabble.com/NoSuchFileException-errors-common-on-version-5-5-0-td4263072.html On Wed, Nov 30, 2016 at 3:19 AM, Shawn Heisey wrote: > On 11/29/2016 8:40 AM, halis Yılboğa wrote: > > it is not normal to get that many error actually. Main problem should be > > from your index. It seems to me your index is corrupted. > > > > 29 Kas 2016 Sal, 14:40 tarihinde, Furkan KAMACI > > şunu yazdı: > > > >> On the other hand, my Solr instance stops frequently due to such errors: > >> > >> 2016-11-29 12:25:36.962 WARN (qtp1528637575-14) [ x:collection1] > >> o.a.s.h.a.LukeRequestHandler Error getting file length for [segments_c] > >> java.nio.file.NoSuchFileException: data/index/segments_c > > If your Solr instance is actually stopping, I would suspect the OOM > script, assuming a non-windows system. On non-windows systems, recent > versions of Solr have a script that forcibly terminates Solr in the > event of an OutOfMemoryError. This script has its own log, which would > be in the same place as solr.log. > > I've never heard of Solr actually crashing on a normally configured > system, and I'm reasonably sure that the message you've indicated is not > something that would cause a crash. In fact, I've never seen it cause > any real issues, just the warning message. > > Thanks, > Shawn > >
Re: LukeRequestHandler Error getting file length for [segments_1l]
us=0 QTime=7 2016-11-29 12:26:03.869 INFO (Thread-0) [ ] o.e.j.s.ServerConnector Stopped ServerConnector@3a52dba3{HTTP/1.1,[http/1.1]}{0.0.0.0:9983} 2016-11-29 12:26:03.870 INFO (Thread-0) [ ] o.a.s.c.CoreContainer Shutting down CoreContainer instance=226744878 2016-11-29 12:26:03.871 INFO (coreCloseExecutor-12-thread-1) [ x:collection1] o.a.s.c.SolrCore [collection1] CLOSING SolrCore org.apache.solr.core.SolrCore@447dc7d4 2016-11-29 12:26:03.884 WARN (Thread-0) [ ] o.e.j.s.ServletContextHandler ServletContextHandler.setHandler should not be called directly. Use insertHandler or setSessionHandler etc. On Tue, Nov 29, 2016 at 1:15 PM, Furkan KAMACI wrote: > I use Solr 6.3 and get too many warning about. Is it usual: > > WARN true LukeRequestHandler Error getting file length for [segments_1l] > java.nio.file.NoSuchFileException: /home/server/solr/collection1/ > data/index/segments_1l > at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes( > UnixFileAttributeViews.java:55) > at sun.nio.fs.UnixFileSystemProvider.readAttributes( > UnixFileSystemProvider.java:144) > at sun.nio.fs.LinuxFileSystemProvider.readAttributes( > LinuxFileSystemProvider.java:99) > at java.nio.file.Files.readAttributes(Files.java:1737) > at java.nio.file.Files.size(Files.java:2332) > at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243) > at org.apache.lucene.store.NRTCachingDirectory.fileLength( > NRTCachingDirectory.java:128) > at org.apache.solr.handler.admin.LukeRequestHandler.getFileLength( > LukeRequestHandler.java:598) > at org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo( > LukeRequestHandler.java:586) > at org.apache.solr.handler.admin.LukeRequestHandler.handleRequestBody( > LukeRequestHandler.java:137) > at org.apache.solr.handler.RequestHandlerBase.handleRequest( > RequestHandlerBase.java:153) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2213) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460) > at org.apache.solr.servlet.SolrDispatchFilter.doFilter( > SolrDispatchFilter.java:303) > at org.apache.solr.servlet.SolrDispatchFilter.doFilter( > SolrDispatchFilter.java:254) > at org.eclipse.jetty.servlet.ServletHandler$CachedChain. > doFilter(ServletHandler.java:1668) > at org.eclipse.jetty.servlet.ServletHandler.doHandle( > ServletHandler.java:581) > at org.eclipse.jetty.server.handler.ScopedHandler.handle( > ScopedHandler.java:143) > at org.eclipse.jetty.security.SecurityHandler.handle( > SecurityHandler.java:548) > at org.eclipse.jetty.server.session.SessionHandler. > doHandle(SessionHandler.java:226) > at org.eclipse.jetty.server.handler.ContextHandler. > doHandle(ContextHandler.java:1160) > at org.eclipse.jetty.servlet.ServletHandler.doScope( > ServletHandler.java:511) > at org.eclipse.jetty.server.session.SessionHandler. > doScope(SessionHandler.java:185) > at org.eclipse.jetty.server.handler.ContextHandler. > doScope(ContextHandler.java:1092) > at org.eclipse.jetty.server.handler.ScopedHandler.handle( > ScopedHandler.java:141) > at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle( > ContextHandlerCollection.java:213) > at org.eclipse.jetty.server.handler.HandlerCollection. > handle(HandlerCollection.java:119) > at org.eclipse.jetty.server.handler.HandlerWrapper.handle( > HandlerWrapper.java:134) > at org.eclipse.jetty.server.Server.handle(Server.java:518) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308) > at org.eclipse.jetty.server.HttpConnection.onFillable( > HttpConnection.java:244) > at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded( > AbstractConnection.java:273) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) > at org.eclipse.jetty.io.SelectChannelEndPoint$2.run( > SelectChannelEndPoint.java:93) > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume. > produceAndRun(ExecuteProduceConsume.java:246) > at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run( > ExecuteProduceConsume.java:156) > at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob( > QueuedThreadPool.java:654) > at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run( > QueuedThreadPool.java:572) > at java.lang.Thread.run(Thread.java:745) > > Kind Regards, > Furkan KAMACI >
LukeRequestHandler Error getting file length for [segments_1l]
I use Solr 6.3 and get too many warning about. Is it usual: WARN true LukeRequestHandler Error getting file length for [segments_1l] java.nio.file.NoSuchFileException: /home/server/solr/collection1/data/index/segments_1l at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) at sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144) at sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99) at java.nio.file.Files.readAttributes(Files.java:1737) at java.nio.file.Files.size(Files.java:2332) at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243) at org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:128) at org.apache.solr.handler.admin.LukeRequestHandler.getFileLength(LukeRequestHandler.java:598) at org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:586) at org.apache.solr.handler.admin.LukeRequestHandler.handleRequestBody(LukeRequestHandler.java:137) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:153) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2213) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:518) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) at java.lang.Thread.run(Thread.java:745) Kind Regards, Furkan KAMACI
Highlight is Empty for A Matched Query
My content has that line: \n \n\n Intelligent En When I search for *intelligent *it returns 1 response as well. My content field is defined as: Highlighter is default too. I just make *highlight=on* and *hl.field=content *However my response does not have any highlights. When I try with different keywords: Some query keywords has highlight section and some of them are not. What my be the problem for that? I didn't edit stopwords, synonyms, etc. Kind Regards, Furkan KAMACI
Re: Metadata and Newline Characters at Content
PS: \n characters are not shown in browser but breaks how highlighter work. \n characters are considered at fragsize too. On Sat, Nov 26, 2016 at 9:47 PM, Furkan KAMACI wrote: > Hi Erick, > > I resolved my metadata problem with configuring solrconfig.xml However > even I post data with post.sh I see content as like: > > CANADA �1 \n \n \n \n Place > > I have newline characters as \n and some non-ASCII characters. As far as I > understand it is usual to have such characters because that is a pdf file > and its newline characters are interpreted as *\n* at Solr. How can I > remove them (\n and non-ASCII characters). > > Kind Regards, > Furkan KAMACI > > On Thu, Nov 24, 2016 at 8:58 PM, Erick Erickson > wrote: > >> Not sure. What have you tried? >> >> For production situations or when you want to take total control of >> the indexing process,I strongly recommend that you put the Tika >> parsing on the _client_. >> >> Here's a writeup on this topic: >> >> https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/ >> >> Best, >> Erick >> >> On Thu, Nov 24, 2016 at 10:37 AM, Furkan KAMACI >> wrote: >> > Hi Erick, >> > >> > When I check the *Solr* documentation I see that [1]: >> > >> > *In addition to Tika's metadata, Solr adds the following metadata >> (defined >> > in ExtractingMetadataConstants):* >> > >> > *"stream_name" - The name of the ContentStream as uploaded to Solr. >> > Depending on how the file is uploaded, this may or may not be set.* >> > *"stream_source_info" - Any source info about the stream. See >> > ContentStream.* >> > *"stream_size" - The size of the stream in bytes(?)* >> > *"stream_content_type" - The content type of the stream, if available.* >> > >> > So, it seems that these may not be added by Tika, but Solr. Do you know >> how >> > to enable/disable this feature? >> > >> > Kind Regards, >> > Furkan KAMACI >> > >> > [1] https://wiki.apache.org/solr/ExtractingRequestHandler >> > >> > On Thu, Nov 24, 2016 at 6:51 PM, Erick Erickson < >> erickerick...@gmail.com> >> > wrote: >> > >> >> about PatternCaptureGroupFilterFactory. This isn't going to help. The >> >> data you see when you return stored data is _before_ any analysis so >> >> the PatternFactory won't be applied. You could do this in a >> >> ScriptUpdateProcessorFactory. Or, just don't worry about it and have >> >> the real app deal with it. >> >> >> >> I don't particularly know about the Tika settings, that's largely a >> guess. >> >> >> >> Best, >> >> Erick >> >> >> >> On Thu, Nov 24, 2016 at 8:43 AM, Furkan KAMACI > > >> >> wrote: >> >> > Hi Erick, >> >> > >> >> > 1) I am looking stored data via Solr Admin UI. I send the query and >> check >> >> > what is in content field. >> >> > >> >> > 2) I can debug the Tika settings if you think that this is not the >> >> desired >> >> > behaviour to have such metadata fields combined into content field. >> >> > >> >> > *PS: *Is there any solution to get rid of it except for >> >> > using PatternCaptureGroupFilterFactory? >> >> > >> >> > Kind Regards, >> >> > Furkan KAMACI >> >> > >> >> > On Thu, Nov 24, 2016 at 6:31 PM, Erick Erickson < >> erickerick...@gmail.com >> >> > >> >> > wrote: >> >> > >> >> >> 1> I'm assuming when you "see" this data you're looking at the >> stored >> >> >> data, right? It's a verbatim copy of whatever you sent to the field. >> >> >> I'm guessing it's a character-encoding mismatch between the source >> and >> >> >> what you use to display. >> >> >> >> >> >> 2> How are you extracting this data? There are Tika options I think >> >> >> that can/do mush fields together. >> >> >> >> >> >> Best, >> >> >> Erick >> >> >> >> >> >> >> >> >> >> >> >> On Thu, Nov 24, 2016 at 7:54 AM, Furkan KAMACI < >> furkankam...@gmail.com> >> >> >> wrote: >> >> >> > Hi, >> >> >> > >> >> >> > I'm testing Solr 4.9.1 I've indexed documents via it. Content >> field at >> >> >> > schema has text_general field type which is not modified from >> >> original. I >> >> >> > do not copy any fields to content. When I check the data I see >> >> content >> >> >> > values as like: >> >> >> > >> >> >> > " \n \nstream_source_info MARLON BRANDO.rtf >> \nstream_content_type >> >> >> > application/rtf \nstream_size 13580 \nstream_name MARLON >> >> BRANDO.rtf >> >> >> > \nContent-Type application/rtf \nresourceName MARLON >> BRANDO.rtf \n >> >> >> \n >> >> >> > \n 1. Vivien Leigh and Marlon Brando in \"A Streetcar Named >> Desire\" >> >> >> > directed by Elia Kazan \n" >> >> >> > >> >> >> > My questions: >> >> >> > >> >> >> > 1) Is it usual to have that newline characters? >> >> >> > 2) Is it usual to have file metadata at the beginning of the >> content >> >> >> (i.e. >> >> >> > stream source, stream_content_type) or related to tool that I post >> >> data >> >> >> to >> >> >> > Solr? >> >> >> > >> >> >> > Kind Regards, >> >> >> > Furkan KAMACI >> >> >> >> >> >> > >
Re: Metadata and Newline Characters at Content
Hi Erick, I resolved my metadata problem with configuring solrconfig.xml However even I post data with post.sh I see content as like: CANADA �1 \n \n \n \n Place I have newline characters as \n and some non-ASCII characters. As far as I understand it is usual to have such characters because that is a pdf file and its newline characters are interpreted as *\n* at Solr. How can I remove them (\n and non-ASCII characters). Kind Regards, Furkan KAMACI On Thu, Nov 24, 2016 at 8:58 PM, Erick Erickson wrote: > Not sure. What have you tried? > > For production situations or when you want to take total control of > the indexing process,I strongly recommend that you put the Tika > parsing on the _client_. > > Here's a writeup on this topic: > > https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/ > > Best, > Erick > > On Thu, Nov 24, 2016 at 10:37 AM, Furkan KAMACI > wrote: > > Hi Erick, > > > > When I check the *Solr* documentation I see that [1]: > > > > *In addition to Tika's metadata, Solr adds the following metadata > (defined > > in ExtractingMetadataConstants):* > > > > *"stream_name" - The name of the ContentStream as uploaded to Solr. > > Depending on how the file is uploaded, this may or may not be set.* > > *"stream_source_info" - Any source info about the stream. See > > ContentStream.* > > *"stream_size" - The size of the stream in bytes(?)* > > *"stream_content_type" - The content type of the stream, if available.* > > > > So, it seems that these may not be added by Tika, but Solr. Do you know > how > > to enable/disable this feature? > > > > Kind Regards, > > Furkan KAMACI > > > > [1] https://wiki.apache.org/solr/ExtractingRequestHandler > > > > On Thu, Nov 24, 2016 at 6:51 PM, Erick Erickson > > > wrote: > > > >> about PatternCaptureGroupFilterFactory. This isn't going to help. The > >> data you see when you return stored data is _before_ any analysis so > >> the PatternFactory won't be applied. You could do this in a > >> ScriptUpdateProcessorFactory. Or, just don't worry about it and have > >> the real app deal with it. > >> > >> I don't particularly know about the Tika settings, that's largely a > guess. > >> > >> Best, > >> Erick > >> > >> On Thu, Nov 24, 2016 at 8:43 AM, Furkan KAMACI > >> wrote: > >> > Hi Erick, > >> > > >> > 1) I am looking stored data via Solr Admin UI. I send the query and > check > >> > what is in content field. > >> > > >> > 2) I can debug the Tika settings if you think that this is not the > >> desired > >> > behaviour to have such metadata fields combined into content field. > >> > > >> > *PS: *Is there any solution to get rid of it except for > >> > using PatternCaptureGroupFilterFactory? > >> > > >> > Kind Regards, > >> > Furkan KAMACI > >> > > >> > On Thu, Nov 24, 2016 at 6:31 PM, Erick Erickson < > erickerick...@gmail.com > >> > > >> > wrote: > >> > > >> >> 1> I'm assuming when you "see" this data you're looking at the stored > >> >> data, right? It's a verbatim copy of whatever you sent to the field. > >> >> I'm guessing it's a character-encoding mismatch between the source > and > >> >> what you use to display. > >> >> > >> >> 2> How are you extracting this data? There are Tika options I think > >> >> that can/do mush fields together. > >> >> > >> >> Best, > >> >> Erick > >> >> > >> >> > >> >> > >> >> On Thu, Nov 24, 2016 at 7:54 AM, Furkan KAMACI < > furkankam...@gmail.com> > >> >> wrote: > >> >> > Hi, > >> >> > > >> >> > I'm testing Solr 4.9.1 I've indexed documents via it. Content > field at > >> >> > schema has text_general field type which is not modified from > >> original. I > >> >> > do not copy any fields to content. When I check the data I see > >> content > >> >> > values as like: > >> >> > > >> >> > " \n \nstream_source_info MARLON BRANDO.rtf > \nstream_content_type > >> >> > application/rtf \nstream_size 13580 \nstream_name MARLON > >> BRANDO.rtf > >> >> > \nContent-Type application/rtf \nresourceName MARLON BRANDO.rtf > \n > >> >> \n > >> >> > \n 1. Vivien Leigh and Marlon Brando in \"A Streetcar Named > Desire\" > >> >> > directed by Elia Kazan \n" > >> >> > > >> >> > My questions: > >> >> > > >> >> > 1) Is it usual to have that newline characters? > >> >> > 2) Is it usual to have file metadata at the beginning of the > content > >> >> (i.e. > >> >> > stream source, stream_content_type) or related to tool that I post > >> data > >> >> to > >> >> > Solr? > >> >> > > >> >> > Kind Regards, > >> >> > Furkan KAMACI > >> >> > >> >
ClassicIndexSchemaFactory with Solr 6.3
Hi, I'm trying Solr 6.3. I don't want to use Managed Schema. It was OK for Solr 5.x. However solrconfig.xml of Solr 6.3 doesn't have a ManagedIndexSchemaFactory definition. Documentation is wrong at this point ( https://cwiki.apache.org/confluence/display/solr/Schema+Factory+Definition+in+SolrConfig ) How can I use ClassicIndexSchemaFactory with Solr 6.3? Kind Regards, Furkan KAMACI
Re: Metadata and Newline Characters at Content
Hi Erick, When I check the *Solr* documentation I see that [1]: *In addition to Tika's metadata, Solr adds the following metadata (defined in ExtractingMetadataConstants):* *"stream_name" - The name of the ContentStream as uploaded to Solr. Depending on how the file is uploaded, this may or may not be set.* *"stream_source_info" - Any source info about the stream. See ContentStream.* *"stream_size" - The size of the stream in bytes(?)* *"stream_content_type" - The content type of the stream, if available.* So, it seems that these may not be added by Tika, but Solr. Do you know how to enable/disable this feature? Kind Regards, Furkan KAMACI [1] https://wiki.apache.org/solr/ExtractingRequestHandler On Thu, Nov 24, 2016 at 6:51 PM, Erick Erickson wrote: > about PatternCaptureGroupFilterFactory. This isn't going to help. The > data you see when you return stored data is _before_ any analysis so > the PatternFactory won't be applied. You could do this in a > ScriptUpdateProcessorFactory. Or, just don't worry about it and have > the real app deal with it. > > I don't particularly know about the Tika settings, that's largely a guess. > > Best, > Erick > > On Thu, Nov 24, 2016 at 8:43 AM, Furkan KAMACI > wrote: > > Hi Erick, > > > > 1) I am looking stored data via Solr Admin UI. I send the query and check > > what is in content field. > > > > 2) I can debug the Tika settings if you think that this is not the > desired > > behaviour to have such metadata fields combined into content field. > > > > *PS: *Is there any solution to get rid of it except for > > using PatternCaptureGroupFilterFactory? > > > > Kind Regards, > > Furkan KAMACI > > > > On Thu, Nov 24, 2016 at 6:31 PM, Erick Erickson > > > wrote: > > > >> 1> I'm assuming when you "see" this data you're looking at the stored > >> data, right? It's a verbatim copy of whatever you sent to the field. > >> I'm guessing it's a character-encoding mismatch between the source and > >> what you use to display. > >> > >> 2> How are you extracting this data? There are Tika options I think > >> that can/do mush fields together. > >> > >> Best, > >> Erick > >> > >> > >> > >> On Thu, Nov 24, 2016 at 7:54 AM, Furkan KAMACI > >> wrote: > >> > Hi, > >> > > >> > I'm testing Solr 4.9.1 I've indexed documents via it. Content field at > >> > schema has text_general field type which is not modified from > original. I > >> > do not copy any fields to content. When I check the data I see > content > >> > values as like: > >> > > >> > " \n \nstream_source_info MARLON BRANDO.rtf \nstream_content_type > >> > application/rtf \nstream_size 13580 \nstream_name MARLON > BRANDO.rtf > >> > \nContent-Type application/rtf \nresourceName MARLON BRANDO.rtf \n > >> \n > >> > \n 1. Vivien Leigh and Marlon Brando in \"A Streetcar Named Desire\" > >> > directed by Elia Kazan \n" > >> > > >> > My questions: > >> > > >> > 1) Is it usual to have that newline characters? > >> > 2) Is it usual to have file metadata at the beginning of the content > >> (i.e. > >> > stream source, stream_content_type) or related to tool that I post > data > >> to > >> > Solr? > >> > > >> > Kind Regards, > >> > Furkan KAMACI > >> >
Re: Metadata and Newline Characters at Content
Hi Erick, 1) I am looking stored data via Solr Admin UI. I send the query and check what is in content field. 2) I can debug the Tika settings if you think that this is not the desired behaviour to have such metadata fields combined into content field. *PS: *Is there any solution to get rid of it except for using PatternCaptureGroupFilterFactory? Kind Regards, Furkan KAMACI On Thu, Nov 24, 2016 at 6:31 PM, Erick Erickson wrote: > 1> I'm assuming when you "see" this data you're looking at the stored > data, right? It's a verbatim copy of whatever you sent to the field. > I'm guessing it's a character-encoding mismatch between the source and > what you use to display. > > 2> How are you extracting this data? There are Tika options I think > that can/do mush fields together. > > Best, > Erick > > > > On Thu, Nov 24, 2016 at 7:54 AM, Furkan KAMACI > wrote: > > Hi, > > > > I'm testing Solr 4.9.1 I've indexed documents via it. Content field at > > schema has text_general field type which is not modified from original. I > > do not copy any fields to content. When I check the data I see content > > values as like: > > > > " \n \nstream_source_info MARLON BRANDO.rtf \nstream_content_type > > application/rtf \nstream_size 13580 \nstream_name MARLON BRANDO.rtf > > \nContent-Type application/rtf \nresourceName MARLON BRANDO.rtf \n > \n > > \n 1. Vivien Leigh and Marlon Brando in \"A Streetcar Named Desire\" > > directed by Elia Kazan \n" > > > > My questions: > > > > 1) Is it usual to have that newline characters? > > 2) Is it usual to have file metadata at the beginning of the content > (i.e. > > stream source, stream_content_type) or related to tool that I post data > to > > Solr? > > > > Kind Regards, > > Furkan KAMACI >
Metadata and Newline Characters at Content
Hi, I'm testing Solr 4.9.1 I've indexed documents via it. Content field at schema has text_general field type which is not modified from original. I do not copy any fields to content. When I check the data I see content values as like: " \n \nstream_source_info MARLON BRANDO.rtf \nstream_content_type application/rtf \nstream_size 13580 \nstream_name MARLON BRANDO.rtf \nContent-Type application/rtf \nresourceName MARLON BRANDO.rtf \n \n \n 1. Vivien Leigh and Marlon Brando in \"A Streetcar Named Desire\" directed by Elia Kazan \n" My questions: 1) Is it usual to have that newline characters? 2) Is it usual to have file metadata at the beginning of the content (i.e. stream source, stream_content_type) or related to tool that I post data to Solr? Kind Regards, Furkan KAMACI
Overlapped Gap Facets
Is it possible to do such a facet on a date field: Last 1 Day Last 1 Week Last 1 Month Last 6 Month Last 1 Year Older than 1 Year which has overlapped facet gaps? Kind Regards, Furkan KAMACI
Re: Aggregate Values Inside a Facet Range
Yes, it works with hours too. You can run a sum function each hour facet which is named as bucket. On Nov 4, 2016 10:14 PM, "William Bell" wrote: > How about hours? > > NOW+1HR > NOW+2HR > NOW+12HR > NOW-4HR > > Can we add that? > > > On Fri, Nov 4, 2016 at 12:25 PM, Furkan KAMACI > wrote: > > > I have documents like that > > > > id:5 > > timestamp:NOW //pseudo date representation > > count:13 > > > > id:4 > > timestamp:NOW //pseudo date representation > > count:3 > > > > id:3 > > timestamp:NOW-1DAY //pseudo date representation > > count:21 > > > > id:2 > > timestamp:NOW-1DAY //pseudo date representation > > count:29 > > > > id:1 > > timestamp:NOW-3DAY //pseudo date representation > > count:4 > > > > When I want to facet last 3 days data by timestamp its OK. However my > need > > is that: > > > > facets: > > TODAY: 16 //pseudo representation > > TODAY - 1: 50 //pseudo date representation > > TODAY - 2: 0 //pseudo date representation > > TODAY - 3: 4 //pseudo date representation > > > > I mean, I have to facet by dates and aggregate values inside that facet > > range. Is it possible to do that without multiple queries at Solr? > > > > Kind Regards, > > Furkan KAMACI > > > > > > -- > Bill Bell > billnb...@gmail.com > cell 720-256-8076 >
Re: Aggregate Values Inside a Facet Range
Seems that Solrj doesn't support JSON Facet API yet. On Fri, Nov 4, 2016 at 9:08 PM, Furkan KAMACI wrote: > Fantastic! Thanks Yonik, I could do the stuff that I want with JSON Facet > API. > > On Fri, Nov 4, 2016 at 8:42 PM, Yonik Seeley wrote: > >> On Fri, Nov 4, 2016 at 2:25 PM, Furkan KAMACI >> wrote: >> > I mean, I have to facet by dates and aggregate values inside that facet >> > range. Is it possible to do that without multiple queries at Solr? >> >> This (old) blog shows a percentiles calculation under a range facet: >> http://yonik.com/percentiles-for-solr-faceting/ >> >> -Yonik >> > >