[jira] [Commented] (SOLR-8307) XXE Vulnerability
[ https://issues.apache.org/jira/browse/SOLR-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011180#comment-15011180 ] Erik Hatcher commented on SOLR-8307: [~thetaphi] looks like the diff feature of the admin UI sends XML (it got from Solr) to SolrInfoMBeanHandler with diff=true. And then that XML is parsed by XMLResponseParser. Looks like a legit vector. > XXE Vulnerability > - > > Key: SOLR-8307 > URL: https://issues.apache.org/jira/browse/SOLR-8307 > Project: Solr > Issue Type: Bug > Components: UI >Affects Versions: 5.3 >Reporter: Adam Johnson > Attachments: SOLR-8307.patch > > > Use the drop-down in the left menu to select a core. Use the “Watch Changes” > feature under the “Plugins / Stats” option. When submitting the changes, XML > is passed in the “stream.body” parameter and is vulnerable to XXE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8307) XXE Vulnerability
[ https://issues.apache.org/jira/browse/SOLR-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011225#comment-15011225 ] Uwe Schindler commented on SOLR-8307: - OK. So a misuse of response parser. This is why it is a Problem. Thanks. I would fix this with the entity resolver. Nothing more to do. > XXE Vulnerability > - > > Key: SOLR-8307 > URL: https://issues.apache.org/jira/browse/SOLR-8307 > Project: Solr > Issue Type: Bug > Components: UI >Affects Versions: 5.3 >Reporter: Adam Johnson >Priority: Blocker > Fix For: 5.4 > > Attachments: SOLR-8307.patch > > > Use the drop-down in the left menu to select a core. Use the “Watch Changes” > feature under the “Plugins / Stats” option. When submitting the changes, XML > is passed in the “stream.body” parameter and is vulnerable to XXE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome Areek Zillur to the Lucene / Solr PMC
Thanks all for the wonderful opportunity :) -Areek On Thu, Nov 12, 2015 at 12:58 AM, Dawid Weisswrote: > Congratulations and welcome, Areek! > Dawid > > On Thu, Nov 12, 2015 at 8:46 AM, Shalin Shekhar Mangar > wrote: > > Welcome Areek! > > > > On Thu, Nov 12, 2015 at 2:18 AM, Simon Willnauer > > wrote: > >> I'm pleased to announce that Areek has accepted the PMC's invitation to > >> join. > >> > >> Welcome Areek! > >> > >> Simon > > > > > > > > -- > > Regards, > > Shalin Shekhar Mangar. > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
[JENKINS] Lucene-Solr-5.x-Linux (64bit/jdk1.7.0_80) - Build # 14662 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/14662/ Java: 64bit/jdk1.7.0_80 -XX:-UseCompressedOops -XX:+UseParallelGC All tests passed Build Log: [...truncated 47831 lines...] BUILD FAILED /home/jenkins/workspace/Lucene-Solr-5.x-Linux/build.xml:785: The following error occurred while executing this line: /home/jenkins/workspace/Lucene-Solr-5.x-Linux/build.xml:101: The following error occurred while executing this line: /home/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/build.xml:137: The following error occurred while executing this line: /home/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/build.xml:475: The following error occurred while executing this line: /home/jenkins/workspace/Lucene-Solr-5.x-Linux/lucene/common-build.xml:2611: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) at java.net.SocketInputStream.read(SocketInputStream.java:122) at sun.security.ssl.InputRecord.readFully(InputRecord.java:442) at sun.security.ssl.InputRecord.read(InputRecord.java:480) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:934) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1332) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1359) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1343) at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185) at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:153) at org.apache.tools.ant.taskdefs.Get$GetThread.openConnection(Get.java:660) at org.apache.tools.ant.taskdefs.Get$GetThread.get(Get.java:579) at org.apache.tools.ant.taskdefs.Get$GetThread.run(Get.java:569) Total time: 57 minutes 36 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts [WARNINGS] Skipping publisher since build result is FAILURE Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8307) XXE Vulnerability
[ https://issues.apache.org/jira/browse/SOLR-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011174#comment-15011174 ] Shawn Heisey commented on SOLR-8307: bq. The patch attached here just modifies SolrJ. How is this related to config file parsing? I'm flailing in the dark here, and obviously do not really understand the implications of the code examples I found. The mbeans handler is what was mentioned in the bug report, so I followed that, and it uses XMLResponseParser, so that's what I modified. I'm not at all surprised that there's a better way. > XXE Vulnerability > - > > Key: SOLR-8307 > URL: https://issues.apache.org/jira/browse/SOLR-8307 > Project: Solr > Issue Type: Bug > Components: UI >Affects Versions: 5.3 >Reporter: Adam Johnson > Attachments: SOLR-8307.patch > > > Use the drop-down in the left menu to select a core. Use the “Watch Changes” > feature under the “Plugins / Stats” option. When submitting the changes, XML > is passed in the “stream.body” parameter and is vulnerable to XXE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8307) XXE Vulnerability
[ https://issues.apache.org/jira/browse/SOLR-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher updated SOLR-8307: --- Fix Version/s: 5.4 > XXE Vulnerability > - > > Key: SOLR-8307 > URL: https://issues.apache.org/jira/browse/SOLR-8307 > Project: Solr > Issue Type: Bug > Components: UI >Affects Versions: 5.3 >Reporter: Adam Johnson >Priority: Blocker > Fix For: 5.4 > > Attachments: SOLR-8307.patch > > > Use the drop-down in the left menu to select a core. Use the “Watch Changes” > feature under the “Plugins / Stats” option. When submitting the changes, XML > is passed in the “stream.body” parameter and is vulnerable to XXE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8307) XXE Vulnerability
[ https://issues.apache.org/jira/browse/SOLR-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher updated SOLR-8307: --- Priority: Blocker (was: Major) > XXE Vulnerability > - > > Key: SOLR-8307 > URL: https://issues.apache.org/jira/browse/SOLR-8307 > Project: Solr > Issue Type: Bug > Components: UI >Affects Versions: 5.3 >Reporter: Adam Johnson >Priority: Blocker > Fix For: 5.4 > > Attachments: SOLR-8307.patch > > > Use the drop-down in the left menu to select a core. Use the “Watch Changes” > feature under the “Plugins / Stats” option. When submitting the changes, XML > is passed in the “stream.body” parameter and is vulnerable to XXE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8283) factor out SortSpecParsing[Test] from QueryParsing[Test]
[ https://issues.apache.org/jira/browse/SOLR-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011195#comment-15011195 ] ASF subversion and git services commented on SOLR-8283: --- Commit 1715011 from [~cpoerschke] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1715011 ] SOLR-8283: factor out StrParser from QueryParsing.StrParser (merge in revision 1714994 from trunk) > factor out SortSpecParsing[Test] from QueryParsing[Test] > > > Key: SOLR-8283 > URL: https://issues.apache.org/jira/browse/SOLR-8283 > Project: Solr > Issue Type: Task >Reporter: Christine Poerschke >Assignee: Christine Poerschke > Attachments: SOLR-8283-part1of2.patch, SOLR-8283.patch > > > patch to follow -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8283) factor out SortSpecParsing[Test] from QueryParsing[Test]
[ https://issues.apache.org/jira/browse/SOLR-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christine Poerschke updated SOLR-8283: -- Attachment: SOLR-8283-part2of2.patch Splitting patch into two parts for easier and clearer handling: * part 1: factor out StrParser from QueryParsing.StrParser * part 2: factor out SortSpecParsing\[Test\] from QueryParsing\[Test\] (both QueryParsing and SortSpecParsing use the StrParser class) > factor out SortSpecParsing[Test] from QueryParsing[Test] > > > Key: SOLR-8283 > URL: https://issues.apache.org/jira/browse/SOLR-8283 > Project: Solr > Issue Type: Task >Reporter: Christine Poerschke >Assignee: Christine Poerschke > Attachments: SOLR-8283-part1of2.patch, SOLR-8283-part2of2.patch, > SOLR-8283.patch > > > patch to follow -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7597) TestRandomFaceting.testRandomFaceting failure
[ https://issues.apache.org/jira/browse/SOLR-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Stults updated SOLR-7597: --- Attachment: SOLR-7597.patch This patch checks to make sure delete requests don't exceed max boolean clauses. During the test we randomly add more documents and randomly delete some. So it's possible for the number of IDs in the model to grow beyond max booleans, and it's also possible for us to coincidentally ask for 99% of them to be deleted. > TestRandomFaceting.testRandomFaceting failure > - > > Key: SOLR-7597 > URL: https://issues.apache.org/jira/browse/SOLR-7597 > Project: Solr > Issue Type: Bug > Components: faceting >Affects Versions: 5.2, Trunk >Reporter: Steve Rowe > Attachments: SOLR-7597.patch > > > Reproduces on trunk and branch_5x: > {noformat} > ant test -Dtestcase=TestRandomFaceting -Dtests.method=testRandomFaceting > -Dtests.seed=2513EC725E45D7D5 -Dtests.slow=true -Dtests.locale=es_CU > -Dtests.timezone=ROK -Dtests.asserts=true -Dtests.file.encoding=US-ASCII > {noformat} > {noformat} >[junit4] 2> 12030 T11 C18 oasc.SolrException.log ERROR > org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: > Cannot parse 'id:(OPIM JTRN JUIQ BVUJ JZFO HEIX UXEU IGVD GCIY ZYEV GDJV YTWX > LMEF IWFQ JLKK CUUN UNRP ZCUP CVDM GHJD WAHV TCDX ZVDY NAJM NRIX JDCL XZWR > MVKC YXIB VCFT AEDD XYRW TZRV LYKR ODDS LSUF VTNE FPOF ALPR AAOF VAOW LUQW > OVNK LKWJ MOQE HXPC IHXD GRXC SKBD EFFJ OEJC LPJH MORL KETU CMBS ALPE QSMT > ROCO ZSFL ELUP MLTC RNXM XIZZ XIBV ZYIN GHRZ MWJK UCOL REOP OYPZ FTGY NJLY > IYPP GHVN PRCW KUCW LENR ABVU GQWC BFDB QWYK WNGI ZPWD BDNC GYJL GOZA JRSC > CKEM TJVV KEMQ IHQU HNIQ HLKJ UWEM TCHG EUQI ZUBH RONB BZDN KWJR BQZN IPYD > OSLQ HCBQ MIZR DPYV YTLV ZVAD SUYR ZXCC ZTYW XWVA NKIL AXOK GJOB MYLO JMUU > DTDI DFLY TPCE MWOH PXRM JJBG AWIQ QMEG XNEW STED NKPW HJDB QVWQ MDTW JNXX > BCFL MJPL XPWR XXXB PFMN FWLC CTCI YRNY ZETI KVCG PBAC UJCZ IIWL KQWT WSHW > OYPB IVEQ OHOC YCAR MZJM SYGM JNHQ USBZ MLMO RIBP VCGN UATJ JLSX XZCR YKVB > DQTT JLXW XQKA LFGU TKSI ISXG IECJ TSZN ICDA CNLE EOFY MYUJ NFST FKCH RSMU > XPUK ZAIO SZVS DJAJ BDMV OFQV LVWP BOJB OZWN UTDB GMLY RJGI HBYX ZRLW VMBZ > JOKG OZSL JDCM FLJZ JIIR VUMR BIKZ OVSU FRKK CLYQ EQTN QKQR AUYT TEHX ESUL > ERYX YDYZ HRUZ VNJR GUPH RJBB ECJF NEUE FHZS YWZD DTGG KNHF BBIR BNBK FLRO > BXNC EOVD BCCK IQNX FZGN PMRY JDCB BMEW CCUF XYZN YKDV VMRL KTOO NZAF XMBF > VYFS AQUV GUFG PRTD YPRM QRGX JXOG MZYE GAQF ICPQ LCPM XXPI OSDZ VTEP RTMB > DNHW FNZH DGGZ QICW XZZD WAZV RKVP WUCK SIOJ TWDE XWON PGJM GDRU KRYM YHXA > SKTW UNCV YWJS KRXO IXFL VZFQ YIKQ RIWZ XJSM AWNU VIRQ YTJR DNGD AIDE GAVB > NAYK ACJM JXOP GHKW BDGQ ORQD VXLF WQAX JZHM GASX CDBH EANO KYDK SPSL TFHV > HFHI AYBX HCTT HOYE YWDJ YFDB YFNQ ZGNF DSSV SCUT IPLI NHZL HYED VTGA DGQT > NRDA KMFR LIVF HGCH ZXKV OZHF ETKX HEQX JUYP UVRJ FSRL EUOY XAER JMJK TQQE > SZRD FTFS ZHSH AXGC LLOF DQFO ORPW EBTN RXRE WUCC SACF TQQM ECTU VIBW YRHJ > YANS QYYC DLFW KFZP QPTA GTYN DLSH HEBE BKQR ZKNC ADVM KOBU TJEC RUYU DBPW > PPLF QOAE NSZQ YPLN ROMQ ZXDM FZDP IAWX XLHC PAKI CQSX QTOE KLLR QYVM EENH > SGBP BNCT PJRU ONVH BKTC QBSA NIDU PQYZ ESRO HPSN DICT VMUU RXLC BCKG XPHV > SJDA BWBB GTBV ZDTW VGEL PMLP QBJJ AFLF FBNX LFCU OYBD PXSQ UCLB KOEZ JZHT > JDGF XNUP GUVM BJVC ZLFS OYBK YCAP LWDW ZAAP SNIA QWLY OAIB CVZH EPWB WPJK > NEBO QEQQ DIQS FYYT REAY IHSC HTMF IYBF GNWP AOPO COLA HVDT MBCU HEHJ JVZE > WDZA GZMD HSNQ WBGQ EDAS JCVP TFHW KCLS FBDZ YADX GWAF UNMF YAAB MAYF SEVC > JXQR MMHM CLNC JZYG OYXT MGSB TTPW AOEK OKPO DJOU MBOL WNML QNMN SNLI EFRM > ZTDE NFBW BABV MZYB UQZR NFWU GOTZ VJIB VMSB REON BNLD KIMC RABW WFYS GQVX > SKEC YGZY XRTP YEEK JIBD HUDU JEPT VEYH JYQK TMKK PAWS JITH PMAO TPTZ WEYZ > DDUJ IGIN PLJT UVCI WHUD FAOA TMDV YBDF SGHO ULJY HOIY VLJU FRLE GCJJ TSQG > CAFT HCLB YMKE ASHE YLCO KOLP KVWS EMQL ELDN KWSF KTEG SMOB ZEWF CPQG JRIC > CHNN GPTI XJBP KUGV SBPS QYIG EJBW UYKD EMYP YVVP CRLR HYFS ODEV LHUB MTEP > CXOY QETZ RAQK CZBD ZGHU WMEP LBUU OKPM VZMV EDKB ONHB FVBQ VIVC ETEM NCSI > ZYEL KTHU HVYM HNFW YLEB HOPD GSBH VJVE FFVD SEAX ZRJH DQSF VEUR WUDN BYNW > EBET YPVJ SPYS UYQR PUYQ CJLU VTHG FBNW TXOE AYSJ IVBU CUDZ QJLL XDQC VAEU > RGLA BHFZ WAJW ANBX GBPB NEVF PFOX ONQB HXQH DRQH VCSE GIAZ CGAA UZPL LDMW > VQDV YSOG LTDO VEOA IKRM GGTX ERKK RDFP ZPSU ZUPA FCYI CJLK HWJS UNNF QAWB > GPYN FOLP VSZK JDGE YNLI KUUZ LZPP NNQU PSLS OUUR BNIZ FAIU CCZB DDIA UZYF > CDOS DRJD SJBA UMNM FUWT BBHG UJYC VNVK LKJJ KKAE RFLP RRJU UBZL XGLA FAER > AKIZ IXIA HIVQ YFOC MUPX ICIU HSFN HJGW GXFU CHEP EJYJ LHOR DORN XRSI JKXX > PXDS WSHT NYUX SCGQ JFJX WNZM HHIC IREW ZHFF JAEN WIYL POZR RKHB XQGY LTDM > SLLM ILGO UMQS PKBN VZCA QIGT KPSW BWAR RNBY IHTO HHPU FOSA AHCE KEBI WGBP > SZIH QZSA PWIC
[jira] [Updated] (SOLR-8234) Federated Search (new) - DJoin
[ https://issues.apache.org/jira/browse/SOLR-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Winch updated SOLR-8234: Attachment: SOLR-8234.patch Patch including in parent docs and improved unit test > Federated Search (new) - DJoin > -- > > Key: SOLR-8234 > URL: https://issues.apache.org/jira/browse/SOLR-8234 > Project: Solr > Issue Type: New Feature >Reporter: Tom Winch >Priority: Minor > Labels: federated_search > Fix For: 4.10.3 > > Attachments: SOLR-8234.patch, SOLR-8234.patch > > > This issue describes a MergeStrategy implementation (DJoin) to facilitate > federated search - that is, distributed search over documents stored in > separated instances of SOLR (for example, one server per continent), where a > single document (identified by an agreed, common unique id) may be stored in > more than one server instance, with (possibly) differing fields and data. > When the MergeStrategy is used in a request handler (via the included > QParser) in combination with distributed search (shards=), documents having > an id that has already been seen are not discarded (as per the default > behaviour) but, instead, are collected and returned as a group of documents > all with the same id taking a single position in the result set (this is > implemented using parent/child documents, with an indicator field in the > parent - see example output, below). > Documents are sorted in the result set based on the highest ranking document > with the same id. It is possible for a document ranking high in one shard to > rank very low on another shard. As a consequence of this, all shards must be > asked to return the fields for every document id in the result set (not just > of those documents they returned), so that all the component parts of each > document in the search result set are returned. > As usual, search parameters are passed on to each shard. So that the shards > do not need any additional configurations in their definition of the /select > request handler, we use the FilterQParserSearchComponent which is configured > to filter out the \{!djoin\} search parameter - otherwise, the target request > handler complains about the missing query parser definition. See the example > config, below. > This issue combines with others to provide full federated search support. See > also SOLR-8235 and SOLR-8236. > Note that this is part of a new implementation of federated search as opposed > to the older issues SOLR-3799 through SOLR-3805. > -- > Example request handler configuration: > {code:xml} >class="org.apache.solr.search.djoin.FilterQParserSearchComponent"> > djoin > > >class="org.apache.solr.search.djoin.DJoinQParserPlugin" /> > > class="org.apache.solr.search.djoin.LocalShardHandlerFactory" /> > >name="shards">http://shard1/solr/core,http://shard2/solr/core,http://shard3/solr/core > true > {!djoin} > > > filter > > > {code} > Example output: > {code:xml} > > > > 0 > 33 > > *:* >name="shards">http://shard1/solr/core,http://shard2/solr/core,http://shard3/solr/core > true > xml > {!djoin} > *,[shard] > > > > > true > > 200 > 1973 > http://shard2/solr/core > 1515645309629235200 > > > 200 > 2015 > http://shard1/solr/core > 1515645309682712576 > > > > true > > 100 > 873 > http://shard1/solr/core > 1515645309629254124 > > > 100 > 2001 > http://shard3/solr/core > 1515645309682792852 > > > > true > > 300 > 1492 > http://shard2/solr/core > 1515645309629251252 > > > > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8234) Federated Search (new) - DJoin
[ https://issues.apache.org/jira/browse/SOLR-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Winch updated SOLR-8234: Description: This issue describes a MergeStrategy implementation (DJoin) to facilitate federated search - that is, distributed search over documents stored in separated instances of SOLR (for example, one server per continent), where a single document (identified by an agreed, common unique id) may be stored in more than one server instance, with (possibly) differing fields and data. When the MergeStrategy is used in a request handler (via the included QParser) in combination with distributed search (shards=), documents having an id that has already been seen are not discarded (as per the default behaviour) but, instead, are collected and returned as a group of documents all with the same id taking a single position in the result set (this is implemented using parent/child documents, with an indicator field in the parent - see example output, below). Documents are sorted in the result set based on the highest ranking document with the same id. It is possible for a document ranking high in one shard to rank very low on another shard. As a consequence of this, all shards must be asked to return the fields for every document id in the result set (not just of those documents they returned), so that all the component parts of each document in the search result set are returned. As usual, search parameters are passed on to each shard. So that the shards do not need any additional configurations in their definition of the /select request handler, we use the FilterQParserSearchComponent which is configured to filter out the \{!djoin\} search parameter - otherwise, the target request handler complains about the missing query parser definition. See the example config, below. This issue combines with others to provide full federated search support. See also SOLR-8235 and SOLR-8236. Note that this is part of a new implementation of federated search as opposed to the older issues SOLR-3799 through SOLR-3805. -- Example request handler configuration: {code:xml} http://shard1/solr/core,http://shard2/solr/core,http://shard3/solr/core true {!djoin} filter {code} Example output: {code:xml} 0 33 *:* http://shard1/solr/core,http://shard2/solr/core,http://shard3/solr/core true xml {!djoin} *,[shard] true 200 1973 http://shard2/solr/core 1515645309629235200 200 2015 http://shard1/solr/core 1515645309682712576 true 100 873 http://shard1/solr/core 1515645309629254124 100 2001 http://shard3/solr/core 1515645309682792852 true 300 1492 http://shard2/solr/core 1515645309629251252 {code} was: This issue describes a MergeStrategy implementation (DJoin) to facilitate federated search - that is, distributed search over documents stored in separated instances of SOLR (for example, one server per continent), where a single document (identified by an agreed, common unique id) may be stored in more than one server instance, with (possibly) differing fields and data. When the MergeStrategy is used in a request handler (via the included QParser) in combination with distributed search (shards=), documents having an id that has already been seen are not discarded (as per the default behaviour) but, instead, are collected and returned as a group of documents all with the same id taking a single position in the result set (this is implemented using parent/child documents, with an indicator field in the parent - see example output, below). Documents are sorted in the result set based on the highest ranking document with the same id. It is possible for a document ranking high in one shard to rank very low on another shard. As a consequence of this, all shards must be asked to return the fields for every document id in the result set (not just of those documents they returned), so that all the component parts of each document in the search result set are returned. As usual, search parameters are passed on to each shard. So that the shards do not need any additional configurations in their definition of the /select request handler, we use the FilterQParserSearchComponent which is configured to filter out the \{!djoin\} search parameter - otherwise, the target request handler complains about the missing query parser definition. See the example config, below. This issue combines with others to provide full federated search support. See also SOLR-8235 and SOLR-8236. Note that this is part of a new
[jira] [Commented] (SOLR-8307) XXE Vulnerability
[ https://issues.apache.org/jira/browse/SOLR-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011185#comment-15011185 ] Shawn Heisey commented on SOLR-8307: Thank you for taking a look and rescuing me from my lack of knowledge. I appreciate the things I learn from my colleagues here. > XXE Vulnerability > - > > Key: SOLR-8307 > URL: https://issues.apache.org/jira/browse/SOLR-8307 > Project: Solr > Issue Type: Bug > Components: UI >Affects Versions: 5.3 >Reporter: Adam Johnson > Attachments: SOLR-8307.patch > > > Use the drop-down in the left menu to select a core. Use the “Watch Changes” > feature under the “Plugins / Stats” option. When submitting the changes, XML > is passed in the “stream.body” parameter and is vulnerable to XXE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8234) Federated Search (new) - DJoin
[ https://issues.apache.org/jira/browse/SOLR-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Winch updated SOLR-8234: Attachment: SOLR-8234.patch > Federated Search (new) - DJoin > -- > > Key: SOLR-8234 > URL: https://issues.apache.org/jira/browse/SOLR-8234 > Project: Solr > Issue Type: New Feature >Reporter: Tom Winch >Priority: Minor > Labels: federated_search > Fix For: 4.10.3 > > Attachments: SOLR-8234.patch, SOLR-8234.patch > > > This issue describes a MergeStrategy implementation (DJoin) to facilitate > federated search - that is, distributed search over documents stored in > separated instances of SOLR (for example, one server per continent), where a > single document (identified by an agreed, common unique id) may be stored in > more than one server instance, with (possibly) differing fields and data. > When the MergeStrategy is used in a request handler (via the included > QParser) in combination with distributed search (shards=), documents having > an id that has already been seen are not discarded (as per the default > behaviour) but, instead, are collected and returned as a group of documents > all with the same id taking a single position in the result set (this is > implemented using parent/child documents, with an indicator field in the > parent - see example output, below). > Documents are sorted in the result set based on the highest ranking document > with the same id. It is possible for a document ranking high in one shard to > rank very low on another shard. As a consequence of this, all shards must be > asked to return the fields for every document id in the result set (not just > of those documents they returned), so that all the component parts of each > document in the search result set are returned. > As usual, search parameters are passed on to each shard. So that the shards > do not need any additional configurations in their definition of the /select > request handler, we use the FilterQParserSearchComponent which is configured > to filter out the \{!djoin\} search parameter - otherwise, the target request > handler complains about the missing query parser definition. See the example > config, below. > This issue combines with others to provide full federated search support. See > also SOLR-8235 and SOLR-8236. > Note that this is part of a new implementation of federated search as opposed > to the older issues SOLR-3799 through SOLR-3805. > -- > Example request handler configuration: > {code:xml} >class="org.apache.solr.search.djoin.FilterDJoinQParserSearchComponent" /> > >class="org.apache.solr.search.djoin.DJoinQParserPlugin" /> > > class="org.apache.solr.search.djoin.LocalShardHandlerFactory" /> > >name="shards">http://shard1/solr/core,http://shard2/solr/core,http://shard3/solr/core > true > {!djoin} > > > filter > > > {code} > Example output: > {code:xml} > > > > 0 > 33 > > *:* >name="shards">http://shard1/solr/core,http://shard2/solr/core,http://shard3/solr/core > true > xml > {!djoin} > *,[shard] > > > > > true > > 200 > 1973 > http://shard2/solr/core > 1515645309629235200 > > > 200 > 2015 > http://shard1/solr/core > 1515645309682712576 > > > > true > > 100 > 873 > http://shard1/solr/core > 1515645309629254124 > > > 100 > 2001 > http://shard3/solr/core > 1515645309682792852 > > > > true > > 300 > 1492 > http://shard2/solr/core > 1515645309629251252 > > > > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8234) Federated Search (new) - DJoin
[ https://issues.apache.org/jira/browse/SOLR-8234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Winch updated SOLR-8234: Attachment: (was: SOLR-8234.patch) > Federated Search (new) - DJoin > -- > > Key: SOLR-8234 > URL: https://issues.apache.org/jira/browse/SOLR-8234 > Project: Solr > Issue Type: New Feature >Reporter: Tom Winch >Priority: Minor > Labels: federated_search > Fix For: 4.10.3 > > Attachments: SOLR-8234.patch, SOLR-8234.patch > > > This issue describes a MergeStrategy implementation (DJoin) to facilitate > federated search - that is, distributed search over documents stored in > separated instances of SOLR (for example, one server per continent), where a > single document (identified by an agreed, common unique id) may be stored in > more than one server instance, with (possibly) differing fields and data. > When the MergeStrategy is used in a request handler (via the included > QParser) in combination with distributed search (shards=), documents having > an id that has already been seen are not discarded (as per the default > behaviour) but, instead, are collected and returned as a group of documents > all with the same id taking a single position in the result set (this is > implemented using parent/child documents, with an indicator field in the > parent - see example output, below). > Documents are sorted in the result set based on the highest ranking document > with the same id. It is possible for a document ranking high in one shard to > rank very low on another shard. As a consequence of this, all shards must be > asked to return the fields for every document id in the result set (not just > of those documents they returned), so that all the component parts of each > document in the search result set are returned. > As usual, search parameters are passed on to each shard. So that the shards > do not need any additional configurations in their definition of the /select > request handler, we use the FilterQParserSearchComponent which is configured > to filter out the \{!djoin\} search parameter - otherwise, the target request > handler complains about the missing query parser definition. See the example > config, below. > This issue combines with others to provide full federated search support. See > also SOLR-8235 and SOLR-8236. > Note that this is part of a new implementation of federated search as opposed > to the older issues SOLR-3799 through SOLR-3805. > -- > Example request handler configuration: > {code:xml} >class="org.apache.solr.search.djoin.FilterDJoinQParserSearchComponent" /> > >class="org.apache.solr.search.djoin.DJoinQParserPlugin" /> > > class="org.apache.solr.search.djoin.LocalShardHandlerFactory" /> > >name="shards">http://shard1/solr/core,http://shard2/solr/core,http://shard3/solr/core > true > {!djoin} > > > filter > > > {code} > Example output: > {code:xml} > > > > 0 > 33 > > *:* >name="shards">http://shard1/solr/core,http://shard2/solr/core,http://shard3/solr/core > true > xml > {!djoin} > *,[shard] > > > > > true > > 200 > 1973 > http://shard2/solr/core > 1515645309629235200 > > > 200 > 2015 > http://shard1/solr/core > 1515645309682712576 > > > > true > > 100 > 873 > http://shard1/solr/core > 1515645309629254124 > > > 100 > 2001 > http://shard3/solr/core > 1515645309682792852 > > > > true > > 300 > 1492 > http://shard2/solr/core > 1515645309629251252 > > > > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7912) Add support for boost and exclude the queried document id in MoreLikeThis QParser
[ https://issues.apache.org/jira/browse/SOLR-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011333#comment-15011333 ] Anshum Gupta commented on SOLR-7912: [~blackwinter] absolutely. I'll take a look at the updated patch. > Add support for boost and exclude the queried document id in MoreLikeThis > QParser > - > > Key: SOLR-7912 > URL: https://issues.apache.org/jira/browse/SOLR-7912 > Project: Solr > Issue Type: Improvement >Reporter: Anshum Gupta >Assignee: Anshum Gupta > Attachments: SOLR-7912.patch, SOLR-7912.patch, SOLR-7912.patch, > SOLR-7912.patch > > > Continuing from SOLR-7639. We need to support boost, and also exclude input > document from returned doc list. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8311) SolrCoreAware and ResourceLoaderAware lifecyel is fragile - particularly with objects that can be created after SolrCore is live
[ https://issues.apache.org/jira/browse/SOLR-8311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011549#comment-15011549 ] Hoss Man commented on SOLR-8311: The motivation for filing this issue was SOLR-8280, where I realized that even though {{SimilarityFactory}} was an allowed {{SolrCoreAware}} API, there were situations when dealing with managed schema that would result in a new SimilariyFactory getting inited at run time w/o ever having {{inform(SolrCore)}} called... {quote} The root problem seems to be that when using the SolrResourceLoader to create newInstances of objects, the loader is tracking what things are SolrCoreAware, ResourceLoaderAware, and/or SolrInfoMBean. Then, just before the SolrCore finishes initialiing itself, it calls a method on SolrResourceLoader to take appropriate action on to inform those instances (and/or add them to the MBean registry) The problem happens when any new instances are created by the SolrResourceLoader _after_ the SolrCore is up and running -- it currently has a {{live}} boolean it uses to just flat out ignore wether or not these instances are SolrCoreAware, ResourceLoaderAware, and/or SolrInfoMBean, meaning that nothing in the call stack ever informs them about the SolrCore. It looks like SOLR-4658 included a bit of a hack work arround for the ResourceLoaderAware schema elements (see IndexSchema's constructor which call's {{loader.inform(loader);}}... http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/schema/IndexSchema.java?r1=1463182=1463181=1463182 ...this seems realy sketchy because it causes *any* ResourceLoaderAware plugins inited so far by the core to be {{inform(ResourceLoader)}}ed once the first IndexSchema is created -- even though that's not suppose to happen until mutch later in the SolrCore constructor just before the CountDownLatch is released. What it does do however is ensure that when a new schema gets loaded later (by the REST API, or a schemaless update processor) and ResourceLoaderAware fieldtypes/analyzers are good to go -- but that doesn't do anything to help SolrCoreAware plugins like SimilarityFactory. {quote} This issue also led to the discovery that {{SimilariyFactory.inform(SolrCore)}} already had special handling because even on startup it _had_ to be called before other any other {{SolrCoreAware}} impls might be informed by {{SolrResourceLoader}} incase they tried to access the core's searcher (which depends on the Similarity)... {quote} * There was already a special kludge for SolrCoreAware SimFactories in SolrCore.initSchema ** looks like this was originally for ensuring that the SimFactories was usable when other SolrCoreAware things (like listeners) got informed of the SolrCore and tried to use the SolrIndexSearcher (which depended on the sim) So i think the most straight forward solution to the problem (SimilarityFactory-ies that implement SolrCoreAware playing nice with managed schema) is to refactor that existing kludge from SolrCore.initSchema to SolrCore.setLatestSchema {quote} SOLR-8280 also contained some discussion about the problem of trying to make a general fix for this in SolrResourceLoader... HOSS: {quote} I'm attaching a work in progress patch where I attempted to fix the underlying problem with SolrResourceLoader by having it keep a refrence to the SolrCore it's tied to such that any new instances after that the would be immediately informed of the SorlCore/ResourceLoader. This fixes some of the tests I mentioned before in this issue that have problems with SchemaSimilarityFactory but causes other failures in other existing test that reload the schema – because any FieldType that is ResourceLoader aware is now being "informed" of the loader as soon as it's instantiated – before even basic init() methods are called. Which makes sense in hind sight – my whole approach here is flawed because the contract is suppoes to be that the init methods will always be called first, and any (valid) inform methods will be called at some point after that once the core/loader is available, but before the instance is used ... calling "new" then "inform" then "init" is maddness. I honestly don't know if there is a sane way to solve this problem in the general case... {quote} Alan: {quote} I have a half-implemented patch hanging around somewhere that tried to clean this up a bit. I think the root problem is that there are two circumstances in which we're using SolrResourceLoader, a) during core initialization when we need to call init() immediately, but wait to call inform() until after the loading latch has been released, and then b) to create new objects once the core is up and serving queries. I tried to split this out into two separate SRL implementations, one of which is private to SolrCore and used only in the constructor, and does the
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011570#comment-15011570 ] Ishan Chattopadhyaya commented on SOLR-8220: Thanks for the review, Keith. bq. 1. [...] This could potentially be very expensive to compute for every singe field for ever single document and also add unnecessary GC pressure by creating new HashSet for all the fields for every single document. I was aware of this, and wanted to fix this as part of the "cleanup / refactoring" I promised. bq. {{doc.getField(fieldName)==null}} the doc fields are a list so this will be O( n ) for each lookup. I used that to ensure we're not re-adding unstored docvalues a second time to the same document. This is necessary here so that we don't re-add such fields to a document was obtained from the documentCache and already has all unstored docvalues in it. I can create a set of fields inside the {{StoredDocument}} class so that a hasField lookup can be speeded up. However, given that it is a Lucene class, I have left this be. Any suggestions? bq. 3) Re multivalued fields: doing introspection for every single value for field for every document is not fast. I think it shouldn't be a problem. In modern JVMs, the {{instanceof}} has negligible cost. However, I will do it once per multivalued field in my next patch. bq. 4) {{SchemaField schemaField = schema.getField(fieldName);}} this throws an exception if the field name is not in the schema (think typos in FL) If it is a dynamic field, it will still work; a wrong field name won't work here. Shouldn't a wrong field name throw an exception, rather than silently dropping it? I am split either ways. bq. This creates a whole bunch of new objects which could be slow and cause a lot of GC pressure, although it may not be an issue. I think this creates at most only the value source object, which isn't too bad. Internally, it uses the docvalues API. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: WordDelimiterFilter - Don't split words ...
GitHub user smartprix opened a pull request: https://github.com/apache/lucene-solr/pull/210 WordDelimiterFilter - Don't split words marked as keyword Currently WordDelimiterFilter also splits keywords into tokens. eg. if 128GB is maked as a keyword using KeywordMarkerFilter, WordDelimiterFilter would still split it into 128 and GB, while ideally it should not split as it is a keyword. You can merge this pull request into a Git repository by running: $ git pull https://github.com/smartprix/lucene-solr trunk Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/210.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #210 commit 50385c79e13c32226e92234b411cf0f2a80cae1b Author: smartprixDate: 2015-11-18T18:45:37Z WordDelimiterFilter - Don't split word marked as keyword --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8311) SolrCoreAware and ResourceLoaderAware lifecyel is fragile - particularly with objects that can be created after SolrCore is live
Hoss Man created SOLR-8311: -- Summary: SolrCoreAware and ResourceLoaderAware lifecyel is fragile - particularly with objects that can be created after SolrCore is live Key: SOLR-8311 URL: https://issues.apache.org/jira/browse/SOLR-8311 Project: Solr Issue Type: Bug Reporter: Hoss Man In general, the situation of when/how {{ResourceLoaderAware}} & {{SolrCoreAware}} instances are "informed" of the ResourceLoader & SolrCore is very kludgy and involves a lot of special casees. For objects initialized _before_ the SolrCore goes "live", {{SolrResourceLoader}} tracks these instances internally, and calls {{inform()}} on all of them -- but for instances created _after_ the SolrCore is live (ex: schema pieces created via runtime REST calls), {{SolrResourceLoader}} does nothing to ensure they are later informed (and typically can't because that must happen after whatever type specific 'init' logic takes place). So there is a lot of special case handling to call {{inform}} methods sprinkled through out he code This issue serves as a refrence point to track/link various comments on the situation, and to cite in comments warning developers about how finicky it is to muck with the list of SolrCoreAware & ResourceLoaderAware allowed implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-EA] Lucene-Solr-trunk-Linux (32bit/jdk1.9.0-ea-b90) - Build # 14954 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/14954/ Java: 32bit/jdk1.9.0-ea-b90 -client -XX:+UseParallelGC 3 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.cloud.SaslZkACLProviderTest Error Message: 5 threads leaked from SUITE scope at org.apache.solr.cloud.SaslZkACLProviderTest: 1) Thread[id=9400, name=ou=system.data, state=TIMED_WAITING, group=TGRP-SaslZkACLProviderTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:218) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2103) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1136) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:853) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1083) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1143) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:632) at java.lang.Thread.run(Thread.java:747)2) Thread[id=9399, name=kdcReplayCache.data, state=TIMED_WAITING, group=TGRP-SaslZkACLProviderTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:218) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2103) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1136) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:853) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1083) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1143) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:632) at java.lang.Thread.run(Thread.java:747)3) Thread[id=9397, name=apacheds, state=WAITING, group=TGRP-SaslZkACLProviderTest] at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:516) at java.util.TimerThread.mainLoop(Timer.java:526) at java.util.TimerThread.run(Timer.java:505)4) Thread[id=9398, name=groupCache.data, state=TIMED_WAITING, group=TGRP-SaslZkACLProviderTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:218) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2103) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1136) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:853) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1083) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1143) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:632) at java.lang.Thread.run(Thread.java:747)5) Thread[id=9401, name=changePwdReplayCache.data, state=TIMED_WAITING, group=TGRP-SaslZkACLProviderTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:218) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2103) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1136) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:853) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1083) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1143) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:632) at java.lang.Thread.run(Thread.java:747) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 5 threads leaked from SUITE scope at org.apache.solr.cloud.SaslZkACLProviderTest: 1) Thread[id=9400, name=ou=system.data, state=TIMED_WAITING, group=TGRP-SaslZkACLProviderTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:218) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2103) at
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011760#comment-15011760 ] Keith Laban commented on SOLR-8220: --- bq. I used that to ensure we're not re-adding unstored docvalues a second time to the same document. This is necessary here so that we don't re-add such fields to a document was obtained from the documentCache and already has all unstored docvalues in it. I can create a set of fields inside the StoredDocument class so that a hasField lookup can be speeded up. However, given that it is a Lucene class, I have left this be. Any suggestions? This shouldn't be an issue since the hook is called after caching is done. This could get really expensive if you are getting a few thousand documents that have hundreds of fields. I think the real issue is how do we cache this efficiently. I think that will require modifying LazyDocument, (see my comments above) bq. If it is a dynamic field, it will still work; a wrong field name won't work here. Shouldn't a wrong field name throw an exception, rather than silently dropping it? I am split either ways. This is more a backwards compat thing. What is current behavior for stored fields? bq. I think this creates at most only the value source object, which isn't too bad. Internally, it uses the docvalues API. for a string field, getValueSource creates a new StrFieldSource and getValues creates a new DocTermsIndexDocValues. Both of these closures add overhead especially if you're doing this hundreds of times for thousands of documents > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8312) Add doc set size and number of buckets metrics
Michael Sun created SOLR-8312: - Summary: Add doc set size and number of buckets metrics Key: SOLR-8312 URL: https://issues.apache.org/jira/browse/SOLR-8312 Project: Solr Issue Type: Sub-task Components: Facet Module Reporter: Michael Sun The doc set size and number of buckets represents the input data size and intermediate data size for each step of facet. Therefore they are useful metrics to be included in telemetry. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011854#comment-15011854 ] Ishan Chattopadhyaya commented on SOLR-8220: bq. This shouldn't be an issue since the hook is called after caching is done. Even if this is called after the document has been added to the cache, this decorate() method changes the same doc object that has been added to the cache. And hence, next time the document is fetched from the cache, it will contain the previously decorated docvalues as part of the stored doc from the cache. I'll look at what it will take to modify the LazyDocument to make this work differently. Are you already looking into it, or have some thoughts around it? bq. Both of these closures add overhead especially if you're doing this hundreds of times for thousands of documents Yes, that makes sense; I hadn't noticed the second object getting created. We should avoid this overhead if possible. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8312) Add doc set size and number of buckets metrics
[ https://issues.apache.org/jira/browse/SOLR-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Sun updated SOLR-8312: -- Description: The doc set size and number of buckets represents the input data size and intermediate data size for each step of facet. Therefore they are useful metrics to be included in telemetry. The output data size is usually defined by user and not too large. Therefore the output data set size is not included. was: The doc set size and number of buckets represents the input data size and intermediate data size for each step of facet. Therefore they are useful metrics to be included in telemetry. > Add doc set size and number of buckets metrics > -- > > Key: SOLR-8312 > URL: https://issues.apache.org/jira/browse/SOLR-8312 > Project: Solr > Issue Type: Sub-task > Components: Facet Module >Reporter: Michael Sun > Fix For: Trunk > > > The doc set size and number of buckets represents the input data size and > intermediate data size for each step of facet. Therefore they are useful > metrics to be included in telemetry. > The output data size is usually defined by user and not too large. Therefore > the output data set size is not included. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request: WordDelimiterFilter - Don't split words ...
Github user elyograg commented on the pull request: https://github.com/apache/lucene-solr/pull/210#issuecomment-157833770 I like this idea. I think I'd go one step further -- make this behavior configurable with an attribute named something like skipKeywordTokens. It should default to true when luceneMatchVersion is 6.0 or higher, and false for anything lower. FYI, the lucene-solr github repository is a read-only mirror of the project in Apache's subversion repository. Apache's Jira installation is the official bugtracker. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8283) factor out SortSpecParsing[Test] from QueryParsing[Test]
[ https://issues.apache.org/jira/browse/SOLR-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011809#comment-15011809 ] ASF subversion and git services commented on SOLR-8283: --- Commit 1715049 from [~cpoerschke] in branch 'dev/trunk' [ https://svn.apache.org/r1715049 ] SOLR-8283: factor out SortSpecParsing[Test] from QueryParsing[Test] > factor out SortSpecParsing[Test] from QueryParsing[Test] > > > Key: SOLR-8283 > URL: https://issues.apache.org/jira/browse/SOLR-8283 > Project: Solr > Issue Type: Task >Reporter: Christine Poerschke >Assignee: Christine Poerschke > Attachments: SOLR-8283-part1of2.patch, SOLR-8283-part2of2.patch, > SOLR-8283.patch > > > patch to follow -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7744) CheckIndex integration
[ https://issues.apache.org/jira/browse/SOLR-7744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011910#comment-15011910 ] Uwe Schindler commented on SOLR-7744: - Hi, I will look into this issue later. I am currently on travel, so I don't have enough time. Uwe > CheckIndex integration > -- > > Key: SOLR-7744 > URL: https://issues.apache.org/jira/browse/SOLR-7744 > Project: Solr > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > > Follow-up of LUCENE-6589. > It would be nice to integrate index consistency checking tools such as > CheckIndex into Solr, which verifies checksums and ensures that redundant > informations in the index are consistent. > Having some way to run these slow verifications would also allow for > integrating LUCENE-6589 which proposes an extension to CheckIndex for block > joins. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011925#comment-15011925 ] Keith Laban commented on SOLR-8220: --- bq. If there is a need to distinguish between docValues as an alternative to a stored field I think this would be the case only for multi valued fields at least until we had an alternative version of docValue multi valued preserving the original field (i.e. not sorted, not set) using something like BinaryDocValues underneath as you mentioned earlier. bq. I'll look at what it will take to modify the LazyDocument to make this work differently. Are you already looking into it, or have some thoughts around it? Doing this properly requires us to be able to know all the possibly docValue fields on a document upfront and a way for LazyDocument to be able to load the lazy field from doc values. A large goal of this should be to have the ability to skip reading stored fields altogether if the field requirement is fully satisfied by docValues. However I'm not sure if using docValues would be more efficient than stored fields when all the fields are being returned. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011950#comment-15011950 ] Yonik Seeley commented on SOLR-8220: bq. I think this would be the case only for multi valued fields at least until we had an alternative version of docValue multi valued preserving the original field (i.e. not sorted, not set) using something like BinaryDocValues underneath as you mentioned earlier. Yup, I agree. I think this is just a case of us having incomplete type support. We need to distinguish between multiValued and setValued in general. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011884#comment-15011884 ] Yonik Seeley commented on SOLR-8220: bq. Added a ~ glob, similar to *. fl= here means: return all conventional stored fields and all non stored docvalues. Purely from an interface perspective (I haven't looked at the code), it feels like this should be transparent. It would be nice to be able to transition from an indexed+stored field to an indexed+docValues field and not have any of the clients know/care. {code} fl=myfield // returns from either stored or docValues fl=*_i // returns all stored or docValues fields ending in _i fl=*// returns all stored fields and all docValues fields that are not stored {code} If there is a need to distinguish between docValues as an alternative to a stored field, and docValues as an implementation detail that you don't want to return to the user (say you transitioned from an indexed-only field to an indexed+docValues field or docValues-only field), then we could introduce a field flag for the schema. Something like includeInStored=true/false or asStored=true/false > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8306) Enhance ExpandComponent to allow expand.hits=0
[ https://issues.apache.org/jira/browse/SOLR-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010441#comment-15010441 ] Shawn Heisey commented on SOLR-8306: Thanks for helping out! I don't know anything about ExpandComponent, so I'm not qualified to review your patch. The idea sounds generally useful, though. Patches against trunk are preferred. This is where primary development occurs. Most of the changes that are applied to trunk are also backported to the stable branch, which is currently branch_5x. We can frequently work with patches against the stable branch as well, because those patches will usually apply to trunk without a lot of manual work. The tag for the latest release, which is what you used, can work very well, but sometimes it doesn't. Tags are static, so trunk and the stable branch can sometimes diverge significantly from the last release tag. When there is a lot of divergence, it can be very difficult to apply the patch to the working branches. > Enhance ExpandComponent to allow expand.hits=0 > -- > > Key: SOLR-8306 > URL: https://issues.apache.org/jira/browse/SOLR-8306 > Project: Solr > Issue Type: Improvement >Affects Versions: 5.3.1 >Reporter: Marshall Sanders >Priority: Minor > Labels: expand > Fix For: 5.3.1 > > Attachments: SOLR-8306.patch > > > This enhancement allows the ExpandComponent to allow expand.hits=0 for those > who don't want an expanded document returned and only want the numFound from > the expand section. > This is useful for "See 54 more like this" use cases, but without the > performance hit of gathering an entire expanded document. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8302) SolrResourceLoader should take a Path to its instance directory, rather than a String
[ https://issues.apache.org/jira/browse/SOLR-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010648#comment-15010648 ] Alan Woodward commented on SOLR-8302: - Yes, I think deprecation is probably the way forward. This will mean changing the name of the new method, as Java doesn't let you have identically-named methods with different return types - will work up a patch to change the name to .getInstancePath(), with the deprecated .getInstanceDir() just redirecting to that. > SolrResourceLoader should take a Path to its instance directory, rather than > a String > - > > Key: SOLR-8302 > URL: https://issues.apache.org/jira/browse/SOLR-8302 > Project: Solr > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward > Attachments: SOLR-8302.patch > > > First step of SOLR-8282. We have a whole bunch of code that deals with > loading things relative to the resource loader's instance dir. These become > a lot simpler if the instance dir is a Path. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8114) Grouping.java: sort variable names confusion
[ https://issues.apache.org/jira/browse/SOLR-8114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010718#comment-15010718 ] ASF subversion and git services commented on SOLR-8114: --- Commit 1714963 from [~cpoerschke] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1714963 ] SOLR-8114: correct CHANGES.txt entry location (was in 6.0.0 section but should have been 5.4.0 section instead) (merge in revision 1714960 from trunk) > Grouping.java: sort variable names confusion > > > Key: SOLR-8114 > URL: https://issues.apache.org/jira/browse/SOLR-8114 > Project: Solr > Issue Type: Wish >Reporter: Christine Poerschke >Assignee: Christine Poerschke >Priority: Minor > Fix For: 5.4, Trunk > > Attachments: SOLR-8114-part1of2.patch, SOLR-8114-part2of2.patch > > > The undistributed case i.e. {{solr/Grouping.java}}'s variable names > confusingly differ from the names used by lucene (and by the distributed > case). > Specifically the name {{groupSort}} in lucene (and in the distributed case) > means between-groups-sort but in the Grouping.java it means within-group-sort. > lucene: > {code} > TermFirstPassGroupingCollector(... Sort groupSort ...) > TermSecondPassGroupingCollector(... Sort groupSort, Sort withinGroupSort ...) > {code} > solr: > {code} > SearchGroupsFieldCommand.java: firstPassGroupingCollector = new > TermFirstPassGroupingCollector(field.getName(), groupSort, topNGroups); > TopGroupsFieldCommand.java: secondPassCollector = new > TermSecondPassGroupingCollector(... groupSort, sortWithinGroup ...); > Grouping.java:public Sort groupSort; // the sort of the documents > *within* a single group. > Grouping.java:public Sort sort;// the sort between groups > Grouping.java: firstPass = new TermFirstPassGroupingCollector(groupBy, sort, > actualGroupsToFind); > Grouping.java: secondPass = new TermSecondPassGroupingCollector(... sort, > groupSort ...); > {code} > This JIRA proposes to rename the Grouping.java variables to remove the > confusion: > * part 1: in Grouping.java rename groupSort to withinGroupSort > * part 2: in Grouping.java rename sort to groupSort -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8310) Solr-5.3.1 doesn't start on CentOS Linux 5 - 64-Bit Server
Subh created SOLR-8310: -- Summary: Solr-5.3.1 doesn't start on CentOS Linux 5 - 64-Bit Server Key: SOLR-8310 URL: https://issues.apache.org/jira/browse/SOLR-8310 Project: Solr Issue Type: Bug Affects Versions: 5.3 Environment: Java Version: java version "1.8.0_65" Java(TM) SE Runtime Environment (build 1.8.0_65-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode) CentOS Version: CentOS release 5.8 (Final) Linux solrserver 2.6.18-308.el5 #1 SMP Tue Feb 21 20:06:06 EST 2012 x86_64 x86_64 x86_64 GNU/Linux Reporter: Subh Apache Solr: solr-5.3.1.tgz Java Version: java version "1.8.0_65" Java(TM) SE Runtime Environment (build 1.8.0_65-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode) CentOS Version: CentOS release 5.8 (Final) Linux solrserver 2.6.18-308.el5 #1 SMP Tue Feb 21 20:06:06 EST 2012 x86_64 x86_64 x86_64 GNU/Linux Error on start (bin/solr start): Waiting up to 30 seconds to see Solr running on port 8983lsof: unsupported TCP/TPI info selection: C lsof: unsupported TCP/TPI info selection: P lsof: unsupported TCP/TPI info selection: : lsof: unsupported TCP/TPI info selection: L lsof: unsupported TCP/TPI info selection: I lsof: unsupported TCP/TPI info selection: S lsof: unsupported TCP/TPI info selection: T lsof: unsupported TCP/TPI info selection: E lsof: unsupported TCP/TPI info selection: N lsof 4.78 latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/ latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man usage: [-?abhlnNoOPRstUvVX] [+|-c c] [+|-d s] [+D D] [+|-f[gG]] [+|-e s] [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+m [m]] [+|-M] [-o [o]] [-p s] [+|-r [t]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [-Z [Z]] [--] [names] Use the -h'' option to get more help information. lsof: unsupported TCP/TPI info selection: C lsof: unsupported TCP/TPI info selection: P lsof: unsupported TCP/TPI info selection: : lsof: unsupported TCP/TPI info selection: L lsof: unsupported TCP/TPI info selection: I lsof: unsupported TCP/TPI info selection: S lsof: unsupported TCP/TPI info selection: T lsof: unsupported TCP/TPI info selection: E lsof: unsupported TCP/TPI info selection: N lsof 4.78 latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/ latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man usage: [-?abhlnNoOPRstUvVX] [+|-c c] [+|-d s] [+D D] [+|-f[gG]] [+|-e s] [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+m [m]] [+|-M] [-o [o]] [-p s] [+|-r [t]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [-Z [Z]] [--] [names] Use the-h'' option to get more help information. [] lsof: unsupported TCP/TPI info selection: C lsof: unsupported TCP/TPI info selection: P lsof: unsupported TCP/TPI info selection: : lsof: unsupported TCP/TPI info selection: L lsof: unsupported TCP/TPI info selection: I lsof: unsupported TCP/TPI info selection: S lsof: unsupported TCP/TPI info selection: T lsof: unsupported TCP/TPI info selection: E lsof: unsupported TCP/TPI info selection: N lsof 4.78 latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/ latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man usage: [-?abhlnNoOPRstUvVX] [+|-c c] [+|-d s] [+D D] [+|-f[gG]] [+|-e s] [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+m [m]] [+|-M] [-o [o]] [-p s] [+|-r [t]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [-Z [Z]] [--] [names] Use the -h'' option to get more help information. [\] lsof: unsupported TCP/TPI info selection: C lsof: unsupported TCP/TPI info selection: P lsof: unsupported TCP/TPI info selection: : lsof: unsupported TCP/TPI info selection: L lsof: unsupported TCP/TPI info selection: I lsof: unsupported TCP/TPI info selection: S lsof: unsupported TCP/TPI info selection: T lsof: unsupported TCP/TPI info selection: E lsof: unsupported TCP/TPI info selection: N lsof 4.78 latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/ latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man usage: [-?abhlnNoOPRstUvVX] [+|-c c] [+|-d s] [+D D] [+|-f[gG]] [+|-e s] [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+m [m]] [+|-M] [-o [o]] [-p s] [+|-r [t]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [-Z [Z]] [--] [names] Use the-h'' option to get more help information. [] lsof: unsupported TCP/TPI info selection: C lsof: unsupported TCP/TPI info selection: P lsof: unsupported TCP/TPI info selection: : lsof: unsupported TCP/TPI info selection: L lsof: unsupported TCP/TPI info selection: I lsof: unsupported TCP/TPI info selection: S lsof: unsupported TCP/TPI info selection: T lsof: unsupported TCP/TPI info selection: E lsof: unsupported TCP/TPI
[jira] [Updated] (SOLR-8283) factor out SortSpecParsing[Test] from QueryParsing[Test]
[ https://issues.apache.org/jira/browse/SOLR-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christine Poerschke updated SOLR-8283: -- Attachment: SOLR-8283-part1of2.patch Splitting patch into two parts for easier and clearer handling: * part 1: factor out StrParser from QueryParsing.StrParser * part 2: factor out SortSpecParsing[Test] from QueryParsing[Test] (both QueryParsing and SortSpecParsing use the StrParser class) > factor out SortSpecParsing[Test] from QueryParsing[Test] > > > Key: SOLR-8283 > URL: https://issues.apache.org/jira/browse/SOLR-8283 > Project: Solr > Issue Type: Task >Reporter: Christine Poerschke >Assignee: Christine Poerschke > Attachments: SOLR-8283-part1of2.patch, SOLR-8283.patch > > > patch to follow -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8310) Solr-5.3.1 doesn't start on CentOS Linux 5 - 64-Bit Server
[ https://issues.apache.org/jira/browse/SOLR-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subh updated SOLR-8310: --- Environment: java version "1.8.0_65", CentOS release 5.8 (Final) (was: Java Version: java version "1.8.0_65" Java(TM) SE Runtime Environment (build 1.8.0_65-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode) CentOS Version: CentOS release 5.8 (Final) Linux solrserver 2.6.18-308.el5 #1 SMP Tue Feb 21 20:06:06 EST 2012 x86_64 x86_64 x86_64 GNU/Linux) > Solr-5.3.1 doesn't start on CentOS Linux 5 - 64-Bit Server > -- > > Key: SOLR-8310 > URL: https://issues.apache.org/jira/browse/SOLR-8310 > Project: Solr > Issue Type: Bug >Affects Versions: 5.3 > Environment: java version "1.8.0_65", CentOS release 5.8 (Final) >Reporter: Subh > Labels: CentOS5, Jdk-1.8.0_65, Solr > > Apache Solr: > solr-5.3.1.tgz > Java Version: > java version "1.8.0_65" > Java(TM) SE Runtime Environment (build 1.8.0_65-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode) > CentOS Version: > CentOS release 5.8 (Final) > Linux solrserver 2.6.18-308.el5 #1 SMP Tue Feb 21 20:06:06 EST 2012 x86_64 > x86_64 x86_64 GNU/Linux > Error on start (bin/solr start): > Waiting up to 30 seconds to see Solr running on port 8983lsof: unsupported > TCP/TPI info selection: C > lsof: unsupported TCP/TPI info selection: P > lsof: unsupported TCP/TPI info selection: : > lsof: unsupported TCP/TPI info selection: L > lsof: unsupported TCP/TPI info selection: I > lsof: unsupported TCP/TPI info selection: S > lsof: unsupported TCP/TPI info selection: T > lsof: unsupported TCP/TPI info selection: E > lsof: unsupported TCP/TPI info selection: N > lsof 4.78 > latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/ > latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ > latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man > usage: [-?abhlnNoOPRstUvVX] [+|-c c] [+|-d s] [+D D] [+|-f[gG]] [+|-e s] > [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+m [m]] [+|-M] [-o [o]] > [-p s] [+|-r [t]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [-Z [Z]] [--] > [names] > Use the -h'' option to get more help information. lsof: unsupported TCP/TPI > info selection: C lsof: unsupported TCP/TPI info selection: P lsof: > unsupported TCP/TPI info selection: : lsof: unsupported TCP/TPI info > selection: L lsof: unsupported TCP/TPI info selection: I lsof: unsupported > TCP/TPI info selection: S lsof: unsupported TCP/TPI info selection: T lsof: > unsupported TCP/TPI info selection: E lsof: unsupported TCP/TPI info > selection: N lsof 4.78 latest revision: > ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/ latest FAQ: > ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ latest man page: > ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man usage: > [-?abhlnNoOPRstUvVX] [+|-c c] [+|-d s] [+D D] [+|-f[gG]] [+|-e s] [-F [f]] > [-g [s]] [-i [i]] [+|-L [l]] [+m [m]] [+|-M] [-o [o]] [-p s] [+|-r [t]] [-S > [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [-Z [Z]] [--] [names] Use the-h'' > option to get more help information. > [] lsof: unsupported TCP/TPI info selection: C > lsof: unsupported TCP/TPI info selection: P > lsof: unsupported TCP/TPI info selection: : > lsof: unsupported TCP/TPI info selection: L > lsof: unsupported TCP/TPI info selection: I > lsof: unsupported TCP/TPI info selection: S > lsof: unsupported TCP/TPI info selection: T > lsof: unsupported TCP/TPI info selection: E > lsof: unsupported TCP/TPI info selection: N > lsof 4.78 > latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/ > latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ > latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man > usage: [-?abhlnNoOPRstUvVX] [+|-c c] [+|-d s] [+D D] [+|-f[gG]] [+|-e s] > [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+m [m]] [+|-M] [-o [o]] > [-p s] [+|-r [t]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [-Z [Z]] [--] > [names] > Use the -h'' option to get more help information. [\] lsof: unsupported > TCP/TPI info selection: C lsof: unsupported TCP/TPI info selection: P lsof: > unsupported TCP/TPI info selection: : lsof: unsupported TCP/TPI info > selection: L lsof: unsupported TCP/TPI info selection: I lsof: unsupported > TCP/TPI info selection: S lsof: unsupported TCP/TPI info selection: T lsof: > unsupported TCP/TPI info selection: E lsof: unsupported TCP/TPI info > selection: N lsof 4.78 latest revision: > ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/ latest FAQ: > ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ latest man page: > ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man usage: > [-?abhlnNoOPRstUvVX] [+|-c c] [+|-d s] [+D D] [+|-f[gG]] [+|-e s] [-F [f]] > [-g [s]] [-i [i]] [+|-L [l]] [+m [m]] [+|-M] [-o [o]] [-p s] [+|-r [t]]
[JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 856 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/856/ 3 tests failed. FAILED: org.apache.solr.cloud.hdfs.HdfsCollectionsAPIDistributedZkTest.test Error Message: Error from server at http://127.0.0.1:46164/awholynewcollection_0: Expected mime type application/octet-stream but got text/html.Error 500HTTP ERROR: 500 Problem accessing /awholynewcollection_0/select. Reason: {msg=Error trying to proxy request for url: http://127.0.0.1:46308/awholynewcollection_0/select,trace=org.apache.solr.common.SolrException: Error trying to proxy request for url: http://127.0.0.1:46308/awholynewcollection_0/select at org.apache.solr.servlet.HttpSolrCall.remoteQuery(HttpSolrCall.java:596) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:444) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:109) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83) at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:364) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool at org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:226) at org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:195) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:423) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) at org.apache.solr.servlet.HttpSolrCall.remoteQuery(HttpSolrCall.java:563) ... 24 more ,code=500} Powered by Jetty:// Stack Trace: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://127.0.0.1:46164/awholynewcollection_0: Expected mime type application/octet-stream but got text/html. Error 500 HTTP ERROR: 500 Problem accessing /awholynewcollection_0/select. Reason: {msg=Error trying to proxy request for url: http://127.0.0.1:46308/awholynewcollection_0/select,trace=org.apache.solr.common.SolrException: Error trying to proxy request for url: http://127.0.0.1:46308/awholynewcollection_0/select at org.apache.solr.servlet.HttpSolrCall.remoteQuery(HttpSolrCall.java:596) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:444) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:109) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83) at
[jira] [Updated] (SOLR-8302) SolrResourceLoader should take a Path to its instance directory, rather than a String
[ https://issues.apache.org/jira/browse/SOLR-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated SOLR-8302: Attachment: SOLR-8302.patch Updated patch with deprecations. > SolrResourceLoader should take a Path to its instance directory, rather than > a String > - > > Key: SOLR-8302 > URL: https://issues.apache.org/jira/browse/SOLR-8302 > Project: Solr > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward > Attachments: SOLR-8302.patch, SOLR-8302.patch > > > First step of SOLR-8282. We have a whole bunch of code that deals with > loading things relative to the resource loader's instance dir. These become > a lot simpler if the instance dir is a Path. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8114) Grouping.java: sort variable names confusion
[ https://issues.apache.org/jira/browse/SOLR-8114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010676#comment-15010676 ] ASF subversion and git services commented on SOLR-8114: --- Commit 1714960 from [~cpoerschke] in branch 'dev/trunk' [ https://svn.apache.org/r1714960 ] SOLR-8114: correct CHANGES.txt entry location (was in 6.0.0 section but should have been 5.4.0 section instead) > Grouping.java: sort variable names confusion > > > Key: SOLR-8114 > URL: https://issues.apache.org/jira/browse/SOLR-8114 > Project: Solr > Issue Type: Wish >Reporter: Christine Poerschke >Assignee: Christine Poerschke >Priority: Minor > Fix For: 5.4, Trunk > > Attachments: SOLR-8114-part1of2.patch, SOLR-8114-part2of2.patch > > > The undistributed case i.e. {{solr/Grouping.java}}'s variable names > confusingly differ from the names used by lucene (and by the distributed > case). > Specifically the name {{groupSort}} in lucene (and in the distributed case) > means between-groups-sort but in the Grouping.java it means within-group-sort. > lucene: > {code} > TermFirstPassGroupingCollector(... Sort groupSort ...) > TermSecondPassGroupingCollector(... Sort groupSort, Sort withinGroupSort ...) > {code} > solr: > {code} > SearchGroupsFieldCommand.java: firstPassGroupingCollector = new > TermFirstPassGroupingCollector(field.getName(), groupSort, topNGroups); > TopGroupsFieldCommand.java: secondPassCollector = new > TermSecondPassGroupingCollector(... groupSort, sortWithinGroup ...); > Grouping.java:public Sort groupSort; // the sort of the documents > *within* a single group. > Grouping.java:public Sort sort;// the sort between groups > Grouping.java: firstPass = new TermFirstPassGroupingCollector(groupBy, sort, > actualGroupsToFind); > Grouping.java: secondPass = new TermSecondPassGroupingCollector(... sort, > groupSort ...); > {code} > This JIRA proposes to rename the Grouping.java variables to remove the > confusion: > * part 1: in Grouping.java rename groupSort to withinGroupSort > * part 2: in Grouping.java rename sort to groupSort -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6837) Add N-best output capability to JapaneseTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010673#comment-15010673 ] Christian Moen commented on LUCENE-6837: Tokenizing Japanese Wikipedia seems fine with nBestCost set, but it seems like random-blasting doesn't pass. Konno-san, I'm wondering if I can ask you the trouble of looking into why the {{testRandomHugeStrings}} fails with the latest patch? The test basically does random-blasting with nBestCost set to 2000. I think it's a good idea that we fix this before we commit. I believe it's easily reproducible, but I used {noformat} ant test -Dtestcase=TestJapaneseTokenizer -Dtests.method=testRandomHugeStrings -Dtests.seed=99EB179B92E66345 -Dtests.slow=true -Dtests.locale=sr_CS -Dtests.timezone=PNT -Dtests.asserts=true -Dtests.file.encoding=US-ASCII {noformat} in my environment. > Add N-best output capability to JapaneseTokenizer > - > > Key: LUCENE-6837 > URL: https://issues.apache.org/jira/browse/LUCENE-6837 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 5.3 >Reporter: KONNO, Hiroharu >Assignee: Christian Moen >Priority: Minor > Attachments: LUCENE-6837.patch, LUCENE-6837.patch, LUCENE-6837.patch > > > Japanese morphological analyzers often generate mis-segmented tokens. N-best > output reduces the impact of mis-segmentation on search result. N-best output > is more meaningful than character N-gram, and it increases hit count too. > If you use N-best output, you can get decompounded tokens (ex: > "シニアソフトウェアエンジニア" => {"シニア", "シニアソフトウェアエンジニア", "ソフトウェア", "エンジニア"}) and > overwrapped tokens (ex: "数学部長谷川" => {"数学", "部", "部長", "長谷川", "谷川"}), > depending on the dictionary and N-best parameter settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8292) TransactionLog.next() does not honor contract and return null for EOF
[ https://issues.apache.org/jira/browse/SOLR-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010699#comment-15010699 ] Renaud Delbru commented on SOLR-8292: - I have checked on the cdcr code side, and whenever a log reader is used, it is by a single thread only. So the problem might be laying somewhere else. > TransactionLog.next() does not honor contract and return null for EOF > - > > Key: SOLR-8292 > URL: https://issues.apache.org/jira/browse/SOLR-8292 > Project: Solr > Issue Type: Bug >Reporter: Erick Erickson > > This came to light in CDCR testing, which stresses this code a lot, there's a > stack trace showing this line (641 trunk) throwing an EOF exception: > o = codec.readVal(fis); > At first I thought to just wrap reading fis in a try/catch and return null, > but looking at the code a bit more I'm not so sure, that seems like it'd mask > what looks at first glance like a bug in the logic. > A few lines earlier (633-4) there's these lines: > // shouldn't currently happen - header and first record are currently written > at the same time > if (fis.position() >= fos.size()) { > Why are we comparing the the input file position against the size of the > output file? Maybe because the 'i' key is right next to the 'o' key? The > comment hints that it's checking for the ability to read the first record in > input stream along with the header. And perhaps there's a different issue > here because the expectation clearly is that the first record should be there > if the header is. > So what's the right thing to do? Wrap in a try/catch and return null for EOF? > Change the test? Do both? > I can take care of either, but wanted a clue whether the comparison of fis to > fos is intended. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8260) Use NIO2 APIs in core discovery
[ https://issues.apache.org/jira/browse/SOLR-8260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010636#comment-15010636 ] Alan Woodward commented on SOLR-8260: - Passing the exception itself to the logger generally ends up with a stacktrace being written out, which I don't think would be particularly useful here. But I like the idea of replacing e.getMessage() with e.toString(), will update. Thanks! > Use NIO2 APIs in core discovery > --- > > Key: SOLR-8260 > URL: https://issues.apache.org/jira/browse/SOLR-8260 > Project: Solr > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward > Fix For: 5.4 > > Attachments: SOLR-8260.patch > > > CorePropertiesLocator currently does all its file system interaction using > java.io.File and friends, which have all sorts of drawbacks with regard to > error handling and reporting. We've been on java 7 for a while now, so we > should use the nio2 Path APIs instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-6899) Upgrade randomizedtesting to 2.3.1
[ https://issues.apache.org/jira/browse/LUCENE-6899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-6899. - Resolution: Fixed > Upgrade randomizedtesting to 2.3.1 > -- > > Key: LUCENE-6899 > URL: https://issues.apache.org/jira/browse/LUCENE-6899 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Fix For: Trunk, 5.4 > > Attachments: LUCENE-6899.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6899) Upgrade randomizedtesting to 2.3.1
[ https://issues.apache.org/jira/browse/LUCENE-6899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010510#comment-15010510 ] ASF subversion and git services commented on LUCENE-6899: - Commit 1714952 from [~dawidweiss] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1714952 ] LUCENE-6899: upgrade randomizedtesting to version 2.3.1 > Upgrade randomizedtesting to 2.3.1 > -- > > Key: LUCENE-6899 > URL: https://issues.apache.org/jira/browse/LUCENE-6899 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Fix For: Trunk, 5.4 > > Attachments: LUCENE-6899.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7912) Add support for boost and exclude the queried document id in MoreLikeThis QParser
[ https://issues.apache.org/jira/browse/SOLR-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010568#comment-15010568 ] Jens Wille commented on SOLR-7912: -- [~anshumg], is there any chance this patch can make it into the upcoming release? > Add support for boost and exclude the queried document id in MoreLikeThis > QParser > - > > Key: SOLR-7912 > URL: https://issues.apache.org/jira/browse/SOLR-7912 > Project: Solr > Issue Type: Improvement >Reporter: Anshum Gupta >Assignee: Anshum Gupta > Attachments: SOLR-7912.patch, SOLR-7912.patch, SOLR-7912.patch, > SOLR-7912.patch > > > Continuing from SOLR-7639. We need to support boost, and also exclude input > document from returned doc list. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6899) Upgrade randomizedtesting to 2.3.1
[ https://issues.apache.org/jira/browse/LUCENE-6899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-6899: Description: This has a bunch of internal and some external improvements, overview here: https://github.com/randomizedtesting/randomizedtesting/releases > Upgrade randomizedtesting to 2.3.1 > -- > > Key: LUCENE-6899 > URL: https://issues.apache.org/jira/browse/LUCENE-6899 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Fix For: Trunk, 5.4 > > Attachments: LUCENE-6899.patch > > > This has a bunch of internal and some external improvements, overview here: > https://github.com/randomizedtesting/randomizedtesting/releases -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8302) SolrResourceLoader should take a Path to its instance directory, rather than a String
[ https://issues.apache.org/jira/browse/SOLR-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010466#comment-15010466 ] Shawn Heisey commented on SOLR-8302: Strong +1 for this idea! I haven't reviewed the patch yet. When I started looking at NIO2 conversion for Solr (in general, not limited to this class), I noticed a lot of code that concatenates strings of harcoded filesystem or resource paths with forward slashes. The code would be much cleaner and cross-platform with resolve and other NIO2 methods. I personally would be OK with simply changing the API, but the javadoc at the class level does not actually say that it is expert or internal. I'm guessing deprecation will be required, unless it's sufficient to add the internal/expert designation in the javadoc at the same time as this change. I do see one method currently marked as expert, but I don't think that method is affected by this patch. > SolrResourceLoader should take a Path to its instance directory, rather than > a String > - > > Key: SOLR-8302 > URL: https://issues.apache.org/jira/browse/SOLR-8302 > Project: Solr > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward > Attachments: SOLR-8302.patch > > > First step of SOLR-8282. We have a whole bunch of code that deals with > loading things relative to the resource loader's instance dir. These become > a lot simpler if the instance dir is a Path. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6837) Add N-best output capability to JapaneseTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Moen updated LUCENE-6837: --- Attachment: LUCENE-6837.patch > Add N-best output capability to JapaneseTokenizer > - > > Key: LUCENE-6837 > URL: https://issues.apache.org/jira/browse/LUCENE-6837 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 5.3 >Reporter: KONNO, Hiroharu >Assignee: Christian Moen >Priority: Minor > Attachments: LUCENE-6837.patch, LUCENE-6837.patch, LUCENE-6837.patch > > > Japanese morphological analyzers often generate mis-segmented tokens. N-best > output reduces the impact of mis-segmentation on search result. N-best output > is more meaningful than character N-gram, and it increases hit count too. > If you use N-best output, you can get decompounded tokens (ex: > "シニアソフトウェアエンジニア" => {"シニア", "シニアソフトウェアエンジニア", "ソフトウェア", "エンジニア"}) and > overwrapped tokens (ex: "数学部長谷川" => {"数学", "部", "部長", "長谷川", "谷川"}), > depending on the dictionary and N-best parameter settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8307) XXE Vulnerability
[ https://issues.apache.org/jira/browse/SOLR-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011084#comment-15011084 ] Uwe Schindler commented on SOLR-8307: - I checked the code: Where is the XXE risk. The stream.body is going through a safe parser. So do you have a testcase? How did you find out that there is an XXE issue? I spent a whole week on fixing all this problems, so how could they reappear. There are also tests that check to prevent XXE at some places! The attached patch only fixes SolrJ, but this is not really a security issue, because it is used to connect to Solr and not arbitrary web sites. > XXE Vulnerability > - > > Key: SOLR-8307 > URL: https://issues.apache.org/jira/browse/SOLR-8307 > Project: Solr > Issue Type: Bug > Components: UI >Affects Versions: 5.3 >Reporter: Adam Johnson > Attachments: SOLR-8307.patch > > > Use the drop-down in the left menu to select a core. Use the “Watch Changes” > feature under the “Plugins / Stats” option. When submitting the changes, XML > is passed in the “stream.body” parameter and is vulnerable to XXE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7539) Add a QueryAutofilteringComponent for query introspection using indexed metadata
[ https://issues.apache.org/jira/browse/SOLR-7539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Sullivan updated SOLR-7539: --- Attachment: SOLR-7539.patch > Add a QueryAutofilteringComponent for query introspection using indexed > metadata > > > Key: SOLR-7539 > URL: https://issues.apache.org/jira/browse/SOLR-7539 > Project: Solr > Issue Type: New Feature >Reporter: Ted Sullivan >Priority: Minor > Fix For: Trunk > > Attachments: SOLR-7539.patch, SOLR-7539.patch, SOLR-7539.patch, > SOLR-7539.patch > > > The Query Autofiltering Component provides a method of inferring user intent > by matching noun phrases that are typically used for faceted-navigation into > Solr filter or boost queries (depending on configuration settings) so that > more precise user queries can be met with more precise results. > The algorithm uses a "longest contiguous phrase match" strategy which allows > it to disambiguate queries where single terms are ambiguous but phrases are > not. It will work when there is structured information in the form of String > fields that are normally used for faceted navigation. It works across fields > by building a map of search term to index field using the Lucene FieldCache > (UninvertingReader). This enables users to create free text, multi-term > queries that combine attributes across facet fields - as if they had searched > and then navigated through several facet layers. To address the problem of > exact-match only semantics of String fields, support for synonyms (including > multi-term synonyms) and stemming was added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8307) XXE Vulnerability
[ https://issues.apache.org/jira/browse/SOLR-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011010#comment-15011010 ] Erik Hatcher commented on SOLR-8307: At a quick glance, it looks like XMLResponseParser ought to {code} EmptyEntityResolver.configureXMLInputFactory(factory); {code} That's something that [~thetaphi] probably put in to prevent this issue in other places. > XXE Vulnerability > - > > Key: SOLR-8307 > URL: https://issues.apache.org/jira/browse/SOLR-8307 > Project: Solr > Issue Type: Bug > Components: UI >Affects Versions: 5.3 >Reporter: Adam Johnson > Attachments: SOLR-8307.patch > > > Use the drop-down in the left menu to select a core. Use the “Watch Changes” > feature under the “Plugins / Stats” option. When submitting the changes, XML > is passed in the “stream.body” parameter and is vulnerable to XXE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8307) XXE Vulnerability
[ https://issues.apache.org/jira/browse/SOLR-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011071#comment-15011071 ] Uwe Schindler commented on SOLR-8307: - The patch attached here just modifies SolrJ. How is this related to config file parsing? > XXE Vulnerability > - > > Key: SOLR-8307 > URL: https://issues.apache.org/jira/browse/SOLR-8307 > Project: Solr > Issue Type: Bug > Components: UI >Affects Versions: 5.3 >Reporter: Adam Johnson > Attachments: SOLR-8307.patch > > > Use the drop-down in the left menu to select a core. Use the “Watch Changes” > feature under the “Plugins / Stats” option. When submitting the changes, XML > is passed in the “stream.body” parameter and is vulnerable to XXE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8283) factor out SortSpecParsing[Test] from QueryParsing[Test]
[ https://issues.apache.org/jira/browse/SOLR-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010974#comment-15010974 ] ASF subversion and git services commented on SOLR-8283: --- Commit 1714994 from [~cpoerschke] in branch 'dev/trunk' [ https://svn.apache.org/r1714994 ] SOLR-8283: factor out StrParser from QueryParsing.StrParser (Christine Poerschke) > factor out SortSpecParsing[Test] from QueryParsing[Test] > > > Key: SOLR-8283 > URL: https://issues.apache.org/jira/browse/SOLR-8283 > Project: Solr > Issue Type: Task >Reporter: Christine Poerschke >Assignee: Christine Poerschke > Attachments: SOLR-8283-part1of2.patch, SOLR-8283.patch > > > patch to follow -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7539) Add a QueryAutofilteringComponent for query introspection using indexed metadata
[ https://issues.apache.org/jira/browse/SOLR-7539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010967#comment-15010967 ] Ted Sullivan commented on SOLR-7539: Thanks [~jmlucjav] - we seem to be on the same page. Thanks for the reference link to your blog. You make some very interesting and useful points. To answer your question, the query autofilter has a configuration to exclude fields that you don't want to be autofiltered. This may also be useful to protect the FST lookup in cases where the field has a very large number of values. > Add a QueryAutofilteringComponent for query introspection using indexed > metadata > > > Key: SOLR-7539 > URL: https://issues.apache.org/jira/browse/SOLR-7539 > Project: Solr > Issue Type: New Feature >Reporter: Ted Sullivan >Priority: Minor > Fix For: Trunk > > Attachments: SOLR-7539.patch, SOLR-7539.patch, SOLR-7539.patch > > > The Query Autofiltering Component provides a method of inferring user intent > by matching noun phrases that are typically used for faceted-navigation into > Solr filter or boost queries (depending on configuration settings) so that > more precise user queries can be met with more precise results. > The algorithm uses a "longest contiguous phrase match" strategy which allows > it to disambiguate queries where single terms are ambiguous but phrases are > not. It will work when there is structured information in the form of String > fields that are normally used for faceted navigation. It works across fields > by building a map of search term to index field using the Lucene FieldCache > (UninvertingReader). This enables users to create free text, multi-term > queries that combine attributes across facet fields - as if they had searched > and then navigated through several facet layers. To address the problem of > exact-match only semantics of String fields, support for synonyms (including > multi-term synonyms) and stemming was added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7539) Add a QueryAutofilteringComponent for query introspection using indexed metadata
[ https://issues.apache.org/jira/browse/SOLR-7539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011006#comment-15011006 ] Ted Sullivan commented on SOLR-7539: I have just uploaded a new patch that adds what I call "verb support" to the autofilter. This enables you to specify terms that will constrain the autofilter field choices. The example I have been using for this is a Music Ontology that I have been using to demonstrate the features of the query autofilter. So if I have records of musicians, songwriters, songs etc. with fields like performer_ss, composer_ss, composition_type_s and so on. If I search for songs written by Johnny Cash vs songs performed by Johnny Cash Without the verb support, the autofilter would pick up composition_type_s:Song and performer_ss:"Johnny Cash" OR composer_ss:"Johnny Cash" because this artist has documents in which he is either (or both) a performer and a songwriter. That is, both of these queries would return the same results because neither 'written' or 'performed' is a value in any document field. By adding configurations like this and some supporting code written,wrote,composed:composer_ss performed,played,sang,recorded:performer_ss The above queries work as expected. The code detects the presence of the modifier in proximity to a term that occurs in the search field (for 'written' that would be 'composer_ss') and then collapses the choices to that field alone so (composer_ss:"Johnny Cash" OR performer_ss:"Johnny Cash") becomes just composer_ss:"Johnny Cash" when the verb is 'written' and performer_ss:"Johnny Cash" when the verb is 'performed'. In addition, noun phrases that are composed of two different nouns in which one acts as a qualifier of the other as in "Beatles Songs" are handled with this configuration: covered,covers:performer_ss|version_s:Cover|original_performer_s:_ENTITY_,recording_type_ss:Song=>original_performer_s:_ENTITY_ In this case, "Beatles Songs" is a single noun phrase that refers to songs written by one or more of the Beatles. With this configuration and supporting code, we can now disambiguate queries like: "Beatles Songs covered" - which are covers of Beatles songs by other artists from "songs Beatles covered" - which are songs performed by the Beatles that were written by other songwriters. Two test cases have been added to the patch to demonstrate these new features. > Add a QueryAutofilteringComponent for query introspection using indexed > metadata > > > Key: SOLR-7539 > URL: https://issues.apache.org/jira/browse/SOLR-7539 > Project: Solr > Issue Type: New Feature >Reporter: Ted Sullivan >Priority: Minor > Fix For: Trunk > > Attachments: SOLR-7539.patch, SOLR-7539.patch, SOLR-7539.patch, > SOLR-7539.patch > > > The Query Autofiltering Component provides a method of inferring user intent > by matching noun phrases that are typically used for faceted-navigation into > Solr filter or boost queries (depending on configuration settings) so that > more precise user queries can be met with more precise results. > The algorithm uses a "longest contiguous phrase match" strategy which allows > it to disambiguate queries where single terms are ambiguous but phrases are > not. It will work when there is structured information in the form of String > fields that are normally used for faceted navigation. It works across fields > by building a map of search term to index field using the Lucene FieldCache > (UninvertingReader). This enables users to create free text, multi-term > queries that combine attributes across facet fields - as if they had searched > and then navigated through several facet layers. To address the problem of > exact-match only semantics of String fields, support for synonyms (including > multi-term synonyms) and stemming was added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8307) XXE Vulnerability
[ https://issues.apache.org/jira/browse/SOLR-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011065#comment-15011065 ] Uwe Schindler commented on SOLR-8307: - Hi, it should use the code pattern as Erik told. Disabling DTDs completly is not a good idea. In general all XML parsing of resources coming from network should follow the same pattern. The EmptyEntityResolver has methods for *all* types of XML parsers to disable external entities, so use it's methods to configure. Grep on EmptyEntityResolver and you will see that all of the above listed parsers are fine (unless somebody broke them again). _Please note:_ This only affects XML coming from the network. Please don't disable xinclude or external entities in Solr's config files. Those should not be accessible through internet anyways, if they are you have bigger problems. It is a officially documented feature that you can ue xinclude and external entities to split your solr config files (I generally place the field types and fields each in a separate XML file and include them into the schema). > XXE Vulnerability > - > > Key: SOLR-8307 > URL: https://issues.apache.org/jira/browse/SOLR-8307 > Project: Solr > Issue Type: Bug > Components: UI >Affects Versions: 5.3 >Reporter: Adam Johnson > Attachments: SOLR-8307.patch > > > Use the drop-down in the left menu to select a core. Use the “Watch Changes” > feature under the “Plugins / Stats” option. When submitting the changes, XML > is passed in the “stream.body” parameter and is vulnerable to XXE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8307) XXE Vulnerability
[ https://issues.apache.org/jira/browse/SOLR-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011084#comment-15011084 ] Uwe Schindler edited comment on SOLR-8307 at 11/18/15 2:31 PM: --- I checked the code: Where is the XXE risk? The stream.body is going through a safe parser. So do you have a testcase? How did you find out that there is an XXE issue? I spent a whole week 2 years ago on fixing all this problems, so how could they reappear? There are also tests that check to prevent XXE at some places! The attached patch only fixes SolrJ, but this is not really a security issue, because it is used to connect to Solr and not arbitrary web sites. was (Author: thetaphi): I checked the code: Where is the XXE risk. The stream.body is going through a safe parser. So do you have a testcase? How did you find out that there is an XXE issue? I spent a whole week on fixing all this problems, so how could they reappear. There are also tests that check to prevent XXE at some places! The attached patch only fixes SolrJ, but this is not really a security issue, because it is used to connect to Solr and not arbitrary web sites. > XXE Vulnerability > - > > Key: SOLR-8307 > URL: https://issues.apache.org/jira/browse/SOLR-8307 > Project: Solr > Issue Type: Bug > Components: UI >Affects Versions: 5.3 >Reporter: Adam Johnson > Attachments: SOLR-8307.patch > > > Use the drop-down in the left menu to select a core. Use the “Watch Changes” > feature under the “Plugins / Stats” option. When submitting the changes, XML > is passed in the “stream.body” parameter and is vulnerable to XXE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8283) factor out SortSpecParsing[Test] from QueryParsing[Test]
[ https://issues.apache.org/jira/browse/SOLR-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012074#comment-15012074 ] ASF subversion and git services commented on SOLR-8283: --- Commit 1715073 from [~cpoerschke] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1715073 ] SOLR-8283: factor out SortSpecParsing[Test] from QueryParsing[Test] (merge in revision 1715049 from trunk) > factor out SortSpecParsing[Test] from QueryParsing[Test] > > > Key: SOLR-8283 > URL: https://issues.apache.org/jira/browse/SOLR-8283 > Project: Solr > Issue Type: Task >Reporter: Christine Poerschke >Assignee: Christine Poerschke > Attachments: SOLR-8283-part1of2.patch, SOLR-8283-part2of2.patch, > SOLR-8283.patch > > > patch to follow -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012076#comment-15012076 ] Yonik Seeley commented on SOLR-8220: The set of fields that have docValues and are not stored can be computed once per index snapshot (from the FieldInfos+schema). There should be no performance impact if there are no un-stored docValues fields in use. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-8313) Migrate to new slf4j logging implementation (log4j 1.x is EOL)
[ https://issues.apache.org/jira/browse/SOLR-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Heisey closed SOLR-8313. -- Resolution: Duplicate I believe our best option is eating our own dogfood and using log4j2. There is already an issue open to switch, closing this one as a duplicate. > Migrate to new slf4j logging implementation (log4j 1.x is EOL) > -- > > Key: SOLR-8313 > URL: https://issues.apache.org/jira/browse/SOLR-8313 > Project: Solr > Issue Type: Improvement > Components: Server >Reporter: Steve Davids > > Log4j 1.x was declared dead (EOL) in August 2015: > https://blogs.apache.org/foundation/entry/apache_logging_services_project_announces > Solr should migrate to a new slf4j logging implementation, the popular > choices these days seem to be either log4j2 or logback. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7887) Upgrade Solr to use log4j2 -- log4j 1 now officially end of life
[ https://issues.apache.org/jira/browse/SOLR-7887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012512#comment-15012512 ] Shawn Heisey commented on SOLR-7887: If there's significant support for logback instead of log4j2, we could go that route, but I think that staying within Apache is probably the best option. > Upgrade Solr to use log4j2 -- log4j 1 now officially end of life > > > Key: SOLR-7887 > URL: https://issues.apache.org/jira/browse/SOLR-7887 > Project: Solr > Issue Type: Task >Affects Versions: 5.2.1 >Reporter: Shawn Heisey > > The logging services project has officially announced the EOL of log4j 1: > https://blogs.apache.org/foundation/entry/apache_logging_services_project_announces > In the official binary jetty deployment, we use use log4j 1.2 as our final > logging destination, so the admin UI has a log watcher that actually uses > log4j and java.util.logging classes. That will need to be extended to add > log4j2. I think that might be the largest pain point to this upgrade. > There is some crossover between log4j2 and slf4j. Figuring out exactly which > jars need to be in the lib/ext directory will take some research. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Windows (32bit/jdk1.8.0_66) - Build # 5408 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/5408/ Java: 32bit/jdk1.8.0_66 -client -XX:+UseParallelGC 1 tests failed. FAILED: org.apache.solr.cloud.CollectionsAPIDistributedZkTest.test Error Message: Error from server at http://127.0.0.1:57636/i/ld/awholynewcollection_0: non ok status: 500, message:Server Error Stack Trace: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://127.0.0.1:57636/i/ld/awholynewcollection_0: non ok status: 500, message:Server Error at __randomizedtesting.SeedInfo.seed([70AB084CDDDC9A55:F8FF37967320F7AD]:0) at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:509) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:150) at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:943) at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:958) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.waitForNon403or404or503(AbstractFullDistribZkTestBase.java:1753) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testCollectionsAPI(CollectionsAPIDistributedZkTest.java:653) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.test(CollectionsAPIDistributedZkTest.java:155) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1764) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:871) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:907) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:921) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:963) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:938) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
[jira] [Comment Edited] (SOLR-8281) Add RollupMergeStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012651#comment-15012651 ] Dennis Gove edited comment on SOLR-8281 at 11/19/15 2:30 AM: - To be honest I think this logic should live in the ParallelStream. As a user of this stream I would expect it to properly merge all workers together, including metrics calculated in those workers. That said, putting it in the ReducerStream is also a good idea. I'm on the fence as to which would be better. Adding to much to the ParallelStream might end up hurting us long-term. was (Author: dpgove): To be honest I think this logic should live in the ParallelStream. As a user of this stream I would expect it to properly merge all workers together, including metrics calculated in those workers. > Add RollupMergeStream to Streaming API > -- > > Key: SOLR-8281 > URL: https://issues.apache.org/jira/browse/SOLR-8281 > Project: Solr > Issue Type: Bug >Reporter: Joel Bernstein >Assignee: Joel Bernstein > > The RollupMergeStream merges the aggregate results emitted by the > RollupStream on *worker* nodes. > This is designed to be used in conjunction with the HashJoinStream to perform > rollup Aggregations on the joined Tuples. The HashJoinStream will require the > tuples to be partitioned on the Join keys. To avoid needing to repartition on > the *group by* fields for the RollupStream, we can perform a merge of the > rolled up Tuples coming from the workers. > The construct would like this: > {code} > mergeRollup (... > parallel (... > rollup (... > hashJoin ( > search(...), > search(...), > on="fieldA" > ) > ) > ) >) > {code} > The pseudo code above would push the *hashJoin* and *rollup* to the *worker* > nodes. The emitted rolled up tuples would be merged by the mergeRollup. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012580#comment-15012580 ] Yonik Seeley commented on SOLR-8220: bq. Does that sound fine? Yep. Seems like we will only have a perf issue when we have many sparse un-stored docValue fields. At that point it might make sense to have a separate docValues field that contains the list of fields for the document. That can be saved for a future optimization though. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8281) Add RollupMergeStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012646#comment-15012646 ] Joel Bernstein edited comment on SOLR-8281 at 11/19/15 2:24 AM: Thinking about this some more, possibly this is a job for the ReducerStream. We could add Operations to the reducer stream and have the operations perform the merge. If we went this route we would scrap the MergeRollupStream and change this ticket to "Add operations to the ReducerStream". was (Author: joel.bernstein): Thinking about this some more, possibly this is a job for the ReducerStream. We could add Operations to the reducer stream and have the operations perform the merge. In we went this route we would scrap the MergeRollupStream and change this ticket to "Add operations to the ReducerStream". > Add RollupMergeStream to Streaming API > -- > > Key: SOLR-8281 > URL: https://issues.apache.org/jira/browse/SOLR-8281 > Project: Solr > Issue Type: Bug >Reporter: Joel Bernstein >Assignee: Joel Bernstein > > The RollupMergeStream merges the aggregate results emitted by the > RollupStream on *worker* nodes. > This is designed to be used in conjunction with the HashJoinStream to perform > rollup Aggregations on the joined Tuples. The HashJoinStream will require the > tuples to be partitioned on the Join keys. To avoid needing to repartition on > the *group by* fields for the RollupStream, we can perform a merge of the > rolled up Tuples coming from the workers. > The construct would like this: > {code} > mergeRollup (... > parallel (... > rollup (... > hashJoin ( > search(...), > search(...), > on="fieldA" > ) > ) > ) >) > {code} > The pseudo code above would push the *hashJoin* and *rollup* to the *worker* > nodes. The emitted rolled up tuples would be merged by the mergeRollup. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8281) Add RollupMergeStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012646#comment-15012646 ] Joel Bernstein commented on SOLR-8281: -- Thinking about this some more, possibly this is a job for the ReducerStream. We could add Operations to the reducer stream and have the operations perform the merge. In we went this route we would scrap the MergeRollupStream and change this ticket to "Add operations to the ReducerStream". > Add RollupMergeStream to Streaming API > -- > > Key: SOLR-8281 > URL: https://issues.apache.org/jira/browse/SOLR-8281 > Project: Solr > Issue Type: Bug >Reporter: Joel Bernstein >Assignee: Joel Bernstein > > The RollupMergeStream merges the aggregate results emitted by the > RollupStream on *worker* nodes. > This is designed to be used in conjunction with the HashJoinStream to perform > rollup Aggregations on the joined Tuples. The HashJoinStream will require the > tuples to be partitioned on the Join keys. To avoid needing to repartition on > the *group by* fields for the RollupStream, we can perform a merge of the > rolled up Tuples coming from the workers. > The construct would like this: > {code} > mergeRollup (... > parallel (... > rollup (... > hashJoin ( > search(...), > search(...), > on="fieldA" > ) > ) > ) >) > {code} > The pseudo code above would push the *hashJoin* and *rollup* to the *worker* > nodes. The emitted rolled up tuples would be merged by the mergeRollup. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8281) Add RollupMergeStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012651#comment-15012651 ] Dennis Gove edited comment on SOLR-8281 at 11/19/15 2:31 AM: - To be honest I think this logic should live in the ParallelStream. As a user of this stream I would expect it to properly merge all workers together, including metrics calculated in those workers. That said, putting it in the ReducerStream is also a good idea. I'm on the fence as to which would be better. Adding too much to the ParallelStream might end up hurting us long-term. was (Author: dpgove): To be honest I think this logic should live in the ParallelStream. As a user of this stream I would expect it to properly merge all workers together, including metrics calculated in those workers. That said, putting it in the ReducerStream is also a good idea. I'm on the fence as to which would be better. Adding to much to the ParallelStream might end up hurting us long-term. > Add RollupMergeStream to Streaming API > -- > > Key: SOLR-8281 > URL: https://issues.apache.org/jira/browse/SOLR-8281 > Project: Solr > Issue Type: Bug >Reporter: Joel Bernstein >Assignee: Joel Bernstein > > The RollupMergeStream merges the aggregate results emitted by the > RollupStream on *worker* nodes. > This is designed to be used in conjunction with the HashJoinStream to perform > rollup Aggregations on the joined Tuples. The HashJoinStream will require the > tuples to be partitioned on the Join keys. To avoid needing to repartition on > the *group by* fields for the RollupStream, we can perform a merge of the > rolled up Tuples coming from the workers. > The construct would like this: > {code} > mergeRollup (... > parallel (... > rollup (... > hashJoin ( > search(...), > search(...), > on="fieldA" > ) > ) > ) >) > {code} > The pseudo code above would push the *hashJoin* and *rollup* to the *worker* > nodes. The emitted rolled up tuples would be merged by the mergeRollup. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8281) Add RollupMergeStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012651#comment-15012651 ] Dennis Gove commented on SOLR-8281: --- To be honest I think this logic should live in the ParallelStream. As a user of this stream I would expect it to properly merge all workers together, including metrics calculated in those workers. > Add RollupMergeStream to Streaming API > -- > > Key: SOLR-8281 > URL: https://issues.apache.org/jira/browse/SOLR-8281 > Project: Solr > Issue Type: Bug >Reporter: Joel Bernstein >Assignee: Joel Bernstein > > The RollupMergeStream merges the aggregate results emitted by the > RollupStream on *worker* nodes. > This is designed to be used in conjunction with the HashJoinStream to perform > rollup Aggregations on the joined Tuples. The HashJoinStream will require the > tuples to be partitioned on the Join keys. To avoid needing to repartition on > the *group by* fields for the RollupStream, we can perform a merge of the > rolled up Tuples coming from the workers. > The construct would like this: > {code} > mergeRollup (... > parallel (... > rollup (... > hashJoin ( > search(...), > search(...), > on="fieldA" > ) > ) > ) >) > {code} > The pseudo code above would push the *hashJoin* and *rollup* to the *worker* > nodes. The emitted rolled up tuples would be merged by the mergeRollup. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (SOLR-8281) Add RollupMergeStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8281: -- Comment: was deleted (was: To be honest I think this logic should live in the ParallelStream. As a user of this stream I would expect it to properly merge all workers together, including metrics calculated in those workers. ) > Add RollupMergeStream to Streaming API > -- > > Key: SOLR-8281 > URL: https://issues.apache.org/jira/browse/SOLR-8281 > Project: Solr > Issue Type: Bug >Reporter: Joel Bernstein >Assignee: Joel Bernstein > > The RollupMergeStream merges the aggregate results emitted by the > RollupStream on *worker* nodes. > This is designed to be used in conjunction with the HashJoinStream to perform > rollup Aggregations on the joined Tuples. The HashJoinStream will require the > tuples to be partitioned on the Join keys. To avoid needing to repartition on > the *group by* fields for the RollupStream, we can perform a merge of the > rolled up Tuples coming from the workers. > The construct would like this: > {code} > mergeRollup (... > parallel (... > rollup (... > hashJoin ( > search(...), > search(...), > on="fieldA" > ) > ) > ) >) > {code} > The pseudo code above would push the *hashJoin* and *rollup* to the *worker* > nodes. The emitted rolled up tuples would be merged by the mergeRollup. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8281) Add RollupMergeStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012650#comment-15012650 ] Dennis Gove commented on SOLR-8281: --- To be honest I think this logic should live in the ParallelStream. As a user of this stream I would expect it to properly merge all workers together, including metrics calculated in those workers. > Add RollupMergeStream to Streaming API > -- > > Key: SOLR-8281 > URL: https://issues.apache.org/jira/browse/SOLR-8281 > Project: Solr > Issue Type: Bug >Reporter: Joel Bernstein >Assignee: Joel Bernstein > > The RollupMergeStream merges the aggregate results emitted by the > RollupStream on *worker* nodes. > This is designed to be used in conjunction with the HashJoinStream to perform > rollup Aggregations on the joined Tuples. The HashJoinStream will require the > tuples to be partitioned on the Join keys. To avoid needing to repartition on > the *group by* fields for the RollupStream, we can perform a merge of the > rolled up Tuples coming from the workers. > The construct would like this: > {code} > mergeRollup (... > parallel (... > rollup (... > hashJoin ( > search(...), > search(...), > on="fieldA" > ) > ) > ) >) > {code} > The pseudo code above would push the *hashJoin* and *rollup* to the *worker* > nodes. The emitted rolled up tuples would be merged by the mergeRollup. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8281) Add RollupMergeStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012646#comment-15012646 ] Joel Bernstein edited comment on SOLR-8281 at 11/19/15 2:26 AM: Thinking about this some more, possibly this is a job for the ReducerStream. We could add Operations to the reducer stream and have the operations perform the merge. If we went this route we would scrap the MergeRollupStream and change this ticket to "Add operations to the ReducerStream". This would also provide a much more powerful ReducerStream for general use. was (Author: joel.bernstein): Thinking about this some more, possibly this is a job for the ReducerStream. We could add Operations to the reducer stream and have the operations perform the merge. If we went this route we would scrap the MergeRollupStream and change this ticket to "Add operations to the ReducerStream". > Add RollupMergeStream to Streaming API > -- > > Key: SOLR-8281 > URL: https://issues.apache.org/jira/browse/SOLR-8281 > Project: Solr > Issue Type: Bug >Reporter: Joel Bernstein >Assignee: Joel Bernstein > > The RollupMergeStream merges the aggregate results emitted by the > RollupStream on *worker* nodes. > This is designed to be used in conjunction with the HashJoinStream to perform > rollup Aggregations on the joined Tuples. The HashJoinStream will require the > tuples to be partitioned on the Join keys. To avoid needing to repartition on > the *group by* fields for the RollupStream, we can perform a merge of the > rolled up Tuples coming from the workers. > The construct would like this: > {code} > mergeRollup (... > parallel (... > rollup (... > hashJoin ( > search(...), > search(...), > on="fieldA" > ) > ) > ) >) > {code} > The pseudo code above would push the *hashJoin* and *rollup* to the *worker* > nodes. The emitted rolled up tuples would be merged by the mergeRollup. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-8307) XXE Vulnerability
[ https://issues.apache.org/jira/browse/SOLR-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher reassigned SOLR-8307: -- Assignee: Erik Hatcher > XXE Vulnerability > - > > Key: SOLR-8307 > URL: https://issues.apache.org/jira/browse/SOLR-8307 > Project: Solr > Issue Type: Bug > Components: UI >Affects Versions: 5.3 >Reporter: Adam Johnson >Assignee: Erik Hatcher >Priority: Blocker > Fix For: 5.4 > > Attachments: SOLR-8307.patch > > > Use the drop-down in the left menu to select a core. Use the “Watch Changes” > feature under the “Plugins / Stats” option. When submitting the changes, XML > is passed in the “stream.body” parameter and is vulnerable to XXE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012633#comment-15012633 ] Erick Erickson commented on SOLR-8220: -- Given that stored values are compressed in 16k blocks and to return a single field from the stored data requires decompressing 16K, how much effect do LazyDocuments really have any more? I don't know, just askin' Since docValues avoids decompressing 16k per doc, disk seeks and the like I strongly suspect that it is vastly more efficient than getting the stored values. That's how Streaming Aggregation can return on 200k-400k docs/second. All that said, I suspect that there are negligible savings (or perhaps even costs) in mixing the two, i.e. if _any_ field to be returned is not DV, you might as well return all the fields from the stored data. Testing would tell though. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8313) Migrate to new slf4j logging implementation (log4j 1.x is EOL)
Steve Davids created SOLR-8313: -- Summary: Migrate to new slf4j logging implementation (log4j 1.x is EOL) Key: SOLR-8313 URL: https://issues.apache.org/jira/browse/SOLR-8313 Project: Solr Issue Type: Improvement Components: Server Reporter: Steve Davids Log4j 1.x was declared dead (EOL) in August 2015: https://blogs.apache.org/foundation/entry/apache_logging_services_project_announces Solr should migrate to a new slf4j logging implementation, the popular choices these days seem to be either log4j2 or logback. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-8290) remove SchemaField.checkFieldCacheSource's unused QParser argument
[ https://issues.apache.org/jira/browse/SOLR-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christine Poerschke resolved SOLR-8290. --- Resolution: Fixed Fix Version/s: Trunk 5.4 > remove SchemaField.checkFieldCacheSource's unused QParser argument > -- > > Key: SOLR-8290 > URL: https://issues.apache.org/jira/browse/SOLR-8290 > Project: Solr > Issue Type: Wish >Reporter: Christine Poerschke >Assignee: Christine Poerschke >Priority: Minor > Fix For: 5.4, Trunk > > Attachments: SOLR-8290.patch > > > From what I could see with a little looking around the argument was added in > 2011 but not used then or since. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7539) Add a QueryAutofilteringComponent for query introspection using indexed metadata
[ https://issues.apache.org/jira/browse/SOLR-7539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012182#comment-15012182 ] Markus Jelsma commented on SOLR-7539: - Hi Ted - i've read the code and your posts about this awesome feature but never really knew how to apply it in real world apps without a.o. the problem pointed out by jmlucjav. So regarding the current solution on that very problem; i feel it would introduce a cumbersome maintenance hazard to any shop or catalog site that has non trivial data, so probably most :) This solution would require very frequent maintenance for any index that is fueled by users or automatically as new examples of said exceptions come and go and are not easy to spot. Is it not a simpler idea to detect these ambiguities and not filter, but emit the ambiguities to the ResponseBuilder so applications can deal with it? You have the control of the SearchComponent so you can let app developers ask the question to users, do you want the performer, or the writer? In any case, filtering is bad here, boosting both writer and performer may be another (additional) solution to deal with ambiguities. I fear labour intense maintenance yields this cool feature unusable. What do you think? M. > Add a QueryAutofilteringComponent for query introspection using indexed > metadata > > > Key: SOLR-7539 > URL: https://issues.apache.org/jira/browse/SOLR-7539 > Project: Solr > Issue Type: New Feature >Reporter: Ted Sullivan >Priority: Minor > Fix For: Trunk > > Attachments: SOLR-7539.patch, SOLR-7539.patch, SOLR-7539.patch, > SOLR-7539.patch > > > The Query Autofiltering Component provides a method of inferring user intent > by matching noun phrases that are typically used for faceted-navigation into > Solr filter or boost queries (depending on configuration settings) so that > more precise user queries can be met with more precise results. > The algorithm uses a "longest contiguous phrase match" strategy which allows > it to disambiguate queries where single terms are ambiguous but phrases are > not. It will work when there is structured information in the form of String > fields that are normally used for faceted navigation. It works across fields > by building a map of search term to index field using the Lucene FieldCache > (UninvertingReader). This enables users to create free text, multi-term > queries that combine attributes across facet fields - as if they had searched > and then navigated through several facet layers. To address the problem of > exact-match only semantics of String fields, support for synonyms (including > multi-term synonyms) and stemming was added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8281) Add RollupMergeStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012457#comment-15012457 ] Dennis Gove commented on SOLR-8281: --- I'd see us needing to make a couple of changes. *RollupStream*: Instead of just adding a raw value to the tuple this should add a tuple itself which contains metadata about the metric. Metadata is required to perform merges on certain metrics (such as mean). *MergeRollupStream*: The construction of this will first validate that all tuples in substreams are mergable. It can do this by asking the substreams for the metrics it intends to calculate or return. Note that this requires a new function in the TupleStream interface whose job it is to return all metrics calculated in that stream or substreams. The read() implementation of this stream will need to read all tuples from the substream (most likely a ParallelStream but doesn't have to be). Each tuple will be added to a map with map\[tupleKey\] = tuple. tupleKey is whatever defines a unique tuple (ie, the group by fields). If a "same" tuple exists in the map already then the existing tuple and read tuple will be merged by calling existingTuple = metric.merge(existingTuple, readTuple) for each metric and then put back into the map. The end result is that the map contains the merged tuples. MergeRollupStream::read() will then return the first tuple from the map. Note, we can use a sorted map or some way to return sorted values from a map so that we can enforce some sort on the read tuples. Also, this allows us to effectively resort the stream to something useful for wrapping streams. I may be leaving something out but I believe this approach (or at least the one I've designed in my head) will give us what we need. An open question is do we return from the MergeRollupStream metrics containing this metadata or should we strip the metadata out? I think we should return it but am not wedded to that idea. > Add RollupMergeStream to Streaming API > -- > > Key: SOLR-8281 > URL: https://issues.apache.org/jira/browse/SOLR-8281 > Project: Solr > Issue Type: Bug >Reporter: Joel Bernstein >Assignee: Joel Bernstein > > The RollupMergeStream merges the aggregate results emitted by the > RollupStream on *worker* nodes. > This is designed to be used in conjunction with the HashJoinStream to perform > rollup Aggregations on the joined Tuples. The HashJoinStream will require the > tuples to be partitioned on the Join keys. To avoid needing to repartition on > the *group by* fields for the RollupStream, we can perform a merge of the > rolled up Tuples coming from the workers. > The construct would like this: > {code} > mergeRollup (... > parallel (... > rollup (... > hashJoin ( > search(...), > search(...), > on="fieldA" > ) > ) > ) >) > {code} > The pseudo code above would push the *hashJoin* and *rollup* to the *worker* > nodes. The emitted rolled up tuples would be merged by the mergeRollup. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012035#comment-15012035 ] Ishan Chattopadhyaya commented on SOLR-8220: bq. Purely from an interface perspective (I haven't looked at the code), it feels like this should be transparent. That makes sense, having {{*}} return all stored and non-stored docvalues. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8298) small preferLocalShards implementation refactor
[ https://issues.apache.org/jira/browse/SOLR-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012246#comment-15012246 ] Timothy Potter commented on SOLR-8298: -- [~cpoerschke] changes look good to me, thanks for cleaning this up a bit > small preferLocalShards implementation refactor > --- > > Key: SOLR-8298 > URL: https://issues.apache.org/jira/browse/SOLR-8298 > Project: Solr > Issue Type: Wish >Reporter: Christine Poerschke >Assignee: Christine Poerschke >Priority: Minor > Attachments: SOLR-8298.patch > > > Towards rebasing the SOLR-6730 patch after SOLR-6832 and other changes - > proposed patch against trunk to follow. > existing calling chain: > * {{ResponseBuilder.addRequest(... ShardRequest sreq)}} does {{sreq.rb = > this;}} so that later on {{HttpShardHandler.submit(ShardRequest sreq ...)}} > can do {{sreq.rb.req.getOriginalParams().getBool}} for > {{CommonParams.PREFER_LOCAL_SHARDS}} > proposed alternative calling chain: > * {{HttpShardHandler.prepDistributed(ResponseBuilder rb)}} sets > {{rb.preferredHostAddress}} and {{SearchHandler}} calls > {{ShardHandler.submit(ShardRequest sreq ... rb.preferredHostAddress)}} > structural changes: > * {{ShardRequest.rb}} member removed in favour of a new > {{ResponseBuilder.preferredHostAddress}} member. > * {{String preferredHostAddress}} argument added to the abstract > {{ShardHandler.submit}} method (and to two derived (test) classes' submit > methods also). > * {code}public void submit(ShardRequest sreq, String shard, > ModifiableSolrParams params) { submit(sreq, shard, params, null); } {code} > added to avoid having to change {{ShardHandler.submit}} callers which don't > have a concept of preferring a local shard e.g. for PeerSync requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7539) Add a QueryAutofilteringComponent for query introspection using indexed metadata
[ https://issues.apache.org/jira/browse/SOLR-7539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012182#comment-15012182 ] Markus Jelsma edited comment on SOLR-7539 at 11/18/15 10:23 PM: Hi Ted - i've read the code and your posts about this awesome feature but never really knew how to apply it in real world apps without a.o. the problem pointed out by jmlucjav. So regarding the current solution on that very problem; i feel it would introduce a cumbersome maintenance hazard to any shop or catalog site that has non trivial data, so probably most :) This solution would require very frequent maintenance for any index that is fueled by users or automatically, as new examples of said exceptions come and go and are not easy to spot. Is it not a simpler idea to detect these ambiguities and not filter, but emit the ambiguities to the ResponseBuilder so applications can deal with it? You have the control of the SearchComponent so you can let app developers ask the question to users, do you want the performer, or the writer? In any case, filtering is bad here, boosting both writer and performer may be another (additional) solution to deal with ambiguities. I fear labour intense maintenance yields this cool feature unusable. What do you think? M. was (Author: markus17): Hi Ted - i've read the code and your posts about this awesome feature but never really knew how to apply it in real world apps without a.o. the problem pointed out by jmlucjav. So regarding the current solution on that very problem; i feel it would introduce a cumbersome maintenance hazard to any shop or catalog site that has non trivial data, so probably most :) This solution would require very frequent maintenance for any index that is fueled by users or automatically as new examples of said exceptions come and go and are not easy to spot. Is it not a simpler idea to detect these ambiguities and not filter, but emit the ambiguities to the ResponseBuilder so applications can deal with it? You have the control of the SearchComponent so you can let app developers ask the question to users, do you want the performer, or the writer? In any case, filtering is bad here, boosting both writer and performer may be another (additional) solution to deal with ambiguities. I fear labour intense maintenance yields this cool feature unusable. What do you think? M. > Add a QueryAutofilteringComponent for query introspection using indexed > metadata > > > Key: SOLR-7539 > URL: https://issues.apache.org/jira/browse/SOLR-7539 > Project: Solr > Issue Type: New Feature >Reporter: Ted Sullivan >Priority: Minor > Fix For: Trunk > > Attachments: SOLR-7539.patch, SOLR-7539.patch, SOLR-7539.patch, > SOLR-7539.patch > > > The Query Autofiltering Component provides a method of inferring user intent > by matching noun phrases that are typically used for faceted-navigation into > Solr filter or boost queries (depending on configuration settings) so that > more precise user queries can be met with more precise results. > The algorithm uses a "longest contiguous phrase match" strategy which allows > it to disambiguate queries where single terms are ambiguous but phrases are > not. It will work when there is structured information in the form of String > fields that are normally used for faceted navigation. It works across fields > by building a map of search term to index field using the Lucene FieldCache > (UninvertingReader). This enables users to create free text, multi-term > queries that combine attributes across facet fields - as if they had searched > and then navigated through several facet layers. To address the problem of > exact-match only semantics of String fields, support for synonyms (including > multi-term synonyms) and stemming was added. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8280) Schemaless features don't work reliably with SolrCoreAware sim factory -- example: SchemaSimilarityFactory
[ https://issues.apache.org/jira/browse/SOLR-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-8280: --- Assignee: Hoss Man Affects Version/s: (was: Trunk) 5.0 Fix Version/s: Trunk 5.4 Description: SOLR-8271 uncovered problems with using SchemaSimilarityFactory + schemaless features. While the broader problems of SolrCoreAware objects inited after the SolrCore was live have been spun off into SOLR-8311 this issue focuses on fixing & testing the core problem of ensuring SchemaSimilarityFactory + schemaless features function together. {panel:title=original bug report} Something about the code path(s) involved in TestCloudSchemaless & ChangedSchemaMergeTest don't play nicely with a SimilarityFactory that is SolrCoreAware -- notably: SchemaSimilarityFactory. I discovered this while trying to implement SOLR-8271, but it can be reproduced trivially by modifying the schema-add-schema-fields-update-processor.xml file used by TestCloudSchemaless (and hardcoded in java schema used by ChangedSchemaMergeTest) to refer to SchemaSimilarityFactory explicitly. Other cloud tests (such as CollectionReloadTest) or cloud+schemaless (ex: TestCloudManagedSchema) tests don't seem to demonstrate the same problem. {panel} was: Something about the code path(s) involved in TestCloudSchemaless & ChangedSchemaMergeTest don't play nicely with a SimilarityFactory that is SolrCoreAware -- notably: SchemaSimilarityFactory. I discovered this while trying to implement SOLR-8271, but it can be reproduced trivially by modifying the schema-add-schema-fields-update-processor.xml file used by TestCloudSchemaless (and hardcoded in java schema used by ChangedSchemaMergeTest) to refer to SchemaSimilarityFactory explicitly. Other cloud tests (such as CollectionReloadTest) or cloud+schemaless (ex: TestCloudManagedSchema) tests don't seem to demonstrate the same problem. Summary: Schemaless features don't work reliably with SolrCoreAware sim factory -- example: SchemaSimilarityFactory (was: TestCloudSchemaless + ChangedSchemaMergeTest fail weirdly if you try to use SolrCoreAware sim factory: SchemaSimilarityFactory ) > Schemaless features don't work reliably with SolrCoreAware sim factory -- > example: SchemaSimilarityFactory > -- > > Key: SOLR-8280 > URL: https://issues.apache.org/jira/browse/SOLR-8280 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0 >Reporter: Hoss Man >Assignee: Hoss Man > Fix For: 5.4, Trunk > > Attachments: SOLR-8280.patch, SOLR-8280.patch, SOLR-8280.patch, > SOLR-8280.patch, SOLR-8280__broken__resource_loader_experiment.patch > > > SOLR-8271 uncovered problems with using SchemaSimilarityFactory + schemaless > features. While the broader problems of SolrCoreAware objects inited after > the SolrCore was live have been spun off into SOLR-8311 this issue focuses on > fixing & testing the core problem of ensuring SchemaSimilarityFactory + > schemaless features function together. > {panel:title=original bug report} > Something about the code path(s) involved in TestCloudSchemaless & > ChangedSchemaMergeTest don't play nicely with a SimilarityFactory that is > SolrCoreAware -- notably: SchemaSimilarityFactory. > I discovered this while trying to implement SOLR-8271, but it can be > reproduced trivially by modifying the > schema-add-schema-fields-update-processor.xml file used by > TestCloudSchemaless (and hardcoded in java schema used by > ChangedSchemaMergeTest) to refer to SchemaSimilarityFactory explicitly. > Other cloud tests (such as CollectionReloadTest) or cloud+schemaless (ex: > TestCloudManagedSchema) tests don't seem to demonstrate the same problem. > {panel} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-EA] Lucene-Solr-trunk-Linux (64bit/jdk1.9.0-ea-b90) - Build # 14957 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/14957/ Java: 64bit/jdk1.9.0-ea-b90 -XX:-UseCompressedOops -XX:+UseParallelGC 3 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.cloud.CollectionsAPIDistributedZkTest Error Message: 5 threads leaked from SUITE scope at org.apache.solr.cloud.CollectionsAPIDistributedZkTest: 1) Thread[id=8609, name=zkCallback-1185-thread-2, state=TIMED_WAITING, group=TGRP-CollectionsAPIDistributedZkTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:218) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:461) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362) at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:937) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1082) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1143) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:632) at java.lang.Thread.run(Thread.java:747)2) Thread[id=8355, name=TEST-CollectionsAPIDistributedZkTest.test-seed#[931D8B8225C0C481]-EventThread, state=WAITING, group=TGRP-CollectionsAPIDistributedZkTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:178) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2061) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:494) 3) Thread[id=8610, name=zkCallback-1185-thread-3, state=TIMED_WAITING, group=TGRP-CollectionsAPIDistributedZkTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:218) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:461) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362) at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:937) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1082) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1143) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:632) at java.lang.Thread.run(Thread.java:747)4) Thread[id=8354, name=TEST-CollectionsAPIDistributedZkTest.test-seed#[931D8B8225C0C481]-SendThread(127.0.0.1:37655), state=TIMED_WAITING, group=TGRP-CollectionsAPIDistributedZkTest] at java.lang.Thread.sleep(Native Method) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:994)5) Thread[id=8356, name=zkCallback-1185-thread-1, state=TIMED_WAITING, group=TGRP-CollectionsAPIDistributedZkTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:218) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:461) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362) at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:937) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1082) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1143) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:632) at java.lang.Thread.run(Thread.java:747) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 5 threads leaked from SUITE scope at org.apache.solr.cloud.CollectionsAPIDistributedZkTest: 1) Thread[id=8609, name=zkCallback-1185-thread-2, state=TIMED_WAITING, group=TGRP-CollectionsAPIDistributedZkTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:218) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:461) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362) at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:937) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1082) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1143) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:632) at java.lang.Thread.run(Thread.java:747) 2) Thread[id=8355, name=TEST-CollectionsAPIDistributedZkTest.test-seed#[931D8B8225C0C481]-EventThread,
[jira] [Updated] (SOLR-8280) TestCloudSchemaless + ChangedSchemaMergeTest fail weirdly if you try to use SolrCoreAware sim factory: SchemaSimilarityFactory
[ https://issues.apache.org/jira/browse/SOLR-8280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-8280: --- Attachment: SOLR-8280.patch New in this patch... * cleanup & beef up nocommit comments to point to new SOLR-8311 trakcing jira * beefed up ChangedSchemaMergeTest to actually change the sim used in each schema & verify it's updated (and fully functional) * put some sanity checks in TestBulkSchemaAPI.testMultipleCommands ** already had some basic verification that adding a fieldtype w/sim + field using that type workd ** now it whitebox verifies that the the underlying SimilarityFactory for the latest schema is and returns the expected Sim for each field. ...still testing, but i think this is good to go. > TestCloudSchemaless + ChangedSchemaMergeTest fail weirdly if you try to use > SolrCoreAware sim factory: SchemaSimilarityFactory > --- > > Key: SOLR-8280 > URL: https://issues.apache.org/jira/browse/SOLR-8280 > Project: Solr > Issue Type: Bug >Affects Versions: Trunk >Reporter: Hoss Man > Attachments: SOLR-8280.patch, SOLR-8280.patch, SOLR-8280.patch, > SOLR-8280.patch, SOLR-8280__broken__resource_loader_experiment.patch > > > Something about the code path(s) involved in TestCloudSchemaless & > ChangedSchemaMergeTest don't play nicely with a SimilarityFactory that is > SolrCoreAware -- notably: SchemaSimilarityFactory. > I discovered this while trying to implement SOLR-8271, but it can be > reproduced trivially by modifying the > schema-add-schema-fields-update-processor.xml file used by > TestCloudSchemaless (and hardcoded in java schema used by > ChangedSchemaMergeTest) to refer to SchemaSimilarityFactory explicitly. > Other cloud tests (such as CollectionReloadTest) or cloud+schemaless (ex: > TestCloudManagedSchema) tests don't seem to demonstrate the same problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8307) XXE Vulnerability
[ https://issues.apache.org/jira/browse/SOLR-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012731#comment-15012731 ] Erik Hatcher commented on SOLR-8307: Solr's "ant test" passed locally. I'll commit to trunk and branch_5x in the next day or two, barring any objections. > XXE Vulnerability > - > > Key: SOLR-8307 > URL: https://issues.apache.org/jira/browse/SOLR-8307 > Project: Solr > Issue Type: Bug > Components: UI >Affects Versions: 5.3 >Reporter: Adam Johnson >Assignee: Erik Hatcher >Priority: Blocker > Fix For: 5.4 > > Attachments: SOLR-8307.patch, SOLR-8307.patch > > > Use the drop-down in the left menu to select a core. Use the “Watch Changes” > feature under the “Plugins / Stats” option. When submitting the changes, XML > is passed in the “stream.body” parameter and is vulnerable to XXE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8309) TestRandomRequestDistribution test failures
[ https://issues.apache.org/jira/browse/SOLR-8309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated SOLR-8309: Attachment: SOLR-8309.patch Patch which waits for the cluster state to be updated by checking how the queries are being processed > TestRandomRequestDistribution test failures > --- > > Key: SOLR-8309 > URL: https://issues.apache.org/jira/browse/SOLR-8309 > Project: Solr > Issue Type: Bug >Reporter: Varun Thacker >Assignee: Varun Thacker > Attachments: SOLR-8309.patch, build-3774.txt, build-624.txt > > > There have been a couple of Jenkins failures for > TestRandomRequestDistribution . > Creating a Jira to track it -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Multiple Classes sharing loggers
Hi devs, While tracing some through some log statements, I noticed that not all of the classes are using their own loggers. In particular, there are several instances of using SolrCore.log outside of the SolrCore class. There's also a few places where we use an inherited logger, like in DirectUpdateHandler2. Are these uses intentional? Sometimes they might make the logs more logically grouped, but they also can make it more difficult to find where execution was taking place if a non-namesake logger is used. If people agree that it's worth correcting, I'm happy to file JIRA(s) and provide necessary patches. Thanks, Mike
[jira] [Updated] (SOLR-8307) XXE Vulnerability
[ https://issues.apache.org/jira/browse/SOLR-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher updated SOLR-8307: --- Attachment: SOLR-8307.patch Here's a patch that does `EmptyEntityResolver.configureXMLInputFactory` for the SolrInfoMBeanHandler diff feature, including test case that fails without the fix. > XXE Vulnerability > - > > Key: SOLR-8307 > URL: https://issues.apache.org/jira/browse/SOLR-8307 > Project: Solr > Issue Type: Bug > Components: UI >Affects Versions: 5.3 >Reporter: Adam Johnson >Assignee: Erik Hatcher >Priority: Blocker > Fix For: 5.4 > > Attachments: SOLR-8307.patch, SOLR-8307.patch > > > Use the drop-down in the left menu to select a core. Use the “Watch Changes” > feature under the “Plugins / Stats” option. When submitting the changes, XML > is passed in the “stream.body” parameter and is vulnerable to XXE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8307) XXE Vulnerability
[ https://issues.apache.org/jira/browse/SOLR-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012694#comment-15012694 ] Erik Hatcher edited comment on SOLR-8307 at 11/19/15 3:15 AM: -- Here's a patch that does `EmptyEntityResolver.configureXMLInputFactory` for the SolrInfoMBeanHandler diff feature, including test case that fails without the fix. My patch includes a move of EmptyEntityResolver from solr-core to solrj too was (Author: ehatcher): Here's a patch that does `EmptyEntityResolver.configureXMLInputFactory` for the SolrInfoMBeanHandler diff feature, including test case that fails without the fix. > XXE Vulnerability > - > > Key: SOLR-8307 > URL: https://issues.apache.org/jira/browse/SOLR-8307 > Project: Solr > Issue Type: Bug > Components: UI >Affects Versions: 5.3 >Reporter: Adam Johnson >Assignee: Erik Hatcher >Priority: Blocker > Fix For: 5.4 > > Attachments: SOLR-8307.patch, SOLR-8307.patch > > > Use the drop-down in the left menu to select a core. Use the “Watch Changes” > feature under the “Plugins / Stats” option. When submitting the changes, XML > is passed in the “stream.body” parameter and is vulnerable to XXE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012826#comment-15012826 ] Yonik Seeley commented on SOLR-8220: Back in the day, LazyField actually had a pointer directly into the index where the field value could be read. That got remove from Lucene at some point, and was replaced with something just for compat sake IIRC that had an N^2 bug... doc was loaded on each lazy-field access, which Hoss found/fixed. But that leaves less performance benefit to using LazyDocument. On a quick look, it seems to load all lazy fields at once when the first lazy field is touched. I guess these days it's more of a memory optimization than a performance one. Might be worth considering new approaches (we can break back compat in trunk for 6.0). Or maybe subclass LazyDocument and do something different for docValues fields. > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8307) XXE Vulnerability
[ https://issues.apache.org/jira/browse/SOLR-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012705#comment-15012705 ] Erik Hatcher commented on SOLR-8307: Addressing [~elyograg]'s list above: * org.apache.solr.handler.DocumentAnalysisRequestHandler: uses EmptyEntityResolver appropriately * org.apache.solr.handler.XmlUpdateRequestHandlerTest: this is a test, so no concern (but it does not use EmptyEntityResolver) * org.apache.solr.handler.dataimport.XPathRecordReader: uses EmptyEntityResolver * org.apache.solr.handler.loader.XMLLoader: uses EmptyEntityResolver * org.apache.solr.update.AddBlockUpdateTest:another test, so no concern, but it also does not use EmptyEntityResolver * org.apache.solr.util.EmptyEntityResolver: this is the fix to potentially evil external entity references So all those look fine. > XXE Vulnerability > - > > Key: SOLR-8307 > URL: https://issues.apache.org/jira/browse/SOLR-8307 > Project: Solr > Issue Type: Bug > Components: UI >Affects Versions: 5.3 >Reporter: Adam Johnson >Assignee: Erik Hatcher >Priority: Blocker > Fix For: 5.4 > > Attachments: SOLR-8307.patch, SOLR-8307.patch > > > Use the drop-down in the left menu to select a core. Use the “Watch Changes” > feature under the “Plugins / Stats” option. When submitting the changes, XML > is passed in the “stream.body” parameter and is vulnerable to XXE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8281) Add RollupMergeStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012755#comment-15012755 ] Joel Bernstein edited comment on SOLR-8281 at 11/19/15 4:12 AM: Early versions of the ParallelStream handled the merging of Rollups. But I pulled it out because I felt this needed more thought. The nice thing about adding operations to the ReducerStream is that it makes the ReducerStream much more useful. So even we don't use it to merge Rollups it's worth doing. But this construct seems nice: {code} reduce (... parallel (... rollup (... hashJoin ( search(...), search(...), on="fieldA" ) ) ) ) {code} Actually this is even nicer {code} reduce (... parallel (... reduce (... hashJoin ( search(...), search(...), on="fieldA" ) ) ) ) {code} In this case the ReducerStream replaces the RollupStream. To support this we would need an Operation to rollup the Metrics. was (Author: joel.bernstein): Early versions of the ParallelStream handled the merging of Rollups. But I pulled it out because I felt this needed more thought. The nice thing about adding operations to the ReducerStream is that it makes the ReducerStream much more useful. So even we don't use to merge Rollups it's worth doing. But this construct seems nice: {code} reduce (... parallel (... rollup (... hashJoin ( search(...), search(...), on="fieldA" ) ) ) ) {code} Actually this is even nicer {code} reduce (... parallel (... reduce (... hashJoin ( search(...), search(...), on="fieldA" ) ) ) ) {code} In this case the ReducerStream replaces the RollupStream. To support this we would need an Operation to rollup the Metrics. > Add RollupMergeStream to Streaming API > -- > > Key: SOLR-8281 > URL: https://issues.apache.org/jira/browse/SOLR-8281 > Project: Solr > Issue Type: Bug >Reporter: Joel Bernstein >Assignee: Joel Bernstein > > The RollupMergeStream merges the aggregate results emitted by the > RollupStream on *worker* nodes. > This is designed to be used in conjunction with the HashJoinStream to perform > rollup Aggregations on the joined Tuples. The HashJoinStream will require the > tuples to be partitioned on the Join keys. To avoid needing to repartition on > the *group by* fields for the RollupStream, we can perform a merge of the > rolled up Tuples coming from the workers. > The construct would like this: > {code} > mergeRollup (... > parallel (... > rollup (... > hashJoin ( > search(...), > search(...), > on="fieldA" > ) > ) > ) >) > {code} > The pseudo code above would push the *hashJoin* and *rollup* to the *worker* > nodes. The emitted rolled up tuples would be merged by the
[jira] [Commented] (SOLR-8281) Add RollupMergeStream to Streaming API
[ https://issues.apache.org/jira/browse/SOLR-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012755#comment-15012755 ] Joel Bernstein commented on SOLR-8281: -- Early versions of the ParallelStream handled the merging of Rollups. But I pulled it out because I felt this needed more thought. The nice thing about adding operations to the ReducerStream is that it makes the ReducerStream much more useful. So even we don't use to merge Rollups it's worth doing. But this construct seems nice: {code} reduce (... parallel (... rollup (... hashJoin ( search(...), search(...), on="fieldA" ) ) ) ) {code} Actually this is even nicer {code} reduce (... parallel (... reduce (... hashJoin ( search(...), search(...), on="fieldA" ) ) ) ) {code} In this case the ReducerStream replaces the RollupStream. To support this we would need an Operation to rollup the Metrics. > Add RollupMergeStream to Streaming API > -- > > Key: SOLR-8281 > URL: https://issues.apache.org/jira/browse/SOLR-8281 > Project: Solr > Issue Type: Bug >Reporter: Joel Bernstein >Assignee: Joel Bernstein > > The RollupMergeStream merges the aggregate results emitted by the > RollupStream on *worker* nodes. > This is designed to be used in conjunction with the HashJoinStream to perform > rollup Aggregations on the joined Tuples. The HashJoinStream will require the > tuples to be partitioned on the Join keys. To avoid needing to repartition on > the *group by* fields for the RollupStream, we can perform a merge of the > rolled up Tuples coming from the workers. > The construct would like this: > {code} > mergeRollup (... > parallel (... > rollup (... > hashJoin ( > search(...), > search(...), > on="fieldA" > ) > ) > ) >) > {code} > The pseudo code above would push the *hashJoin* and *rollup* to the *worker* > nodes. The emitted rolled up tuples would be merged by the mergeRollup. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-EA] Lucene-Solr-trunk-Linux (64bit/jdk1.9.0-ea-b90) - Build # 14960 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/14960/ Java: 64bit/jdk1.9.0-ea-b90 -XX:+UseCompressedOops -XX:+UseSerialGC 3 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.cloud.CollectionsAPIDistributedZkTest Error Message: 6 threads leaked from SUITE scope at org.apache.solr.cloud.CollectionsAPIDistributedZkTest: 1) Thread[id=6176, name=zkCallback-946-thread-1, state=TIMED_WAITING, group=TGRP-CollectionsAPIDistributedZkTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:218) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:461) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362) at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:937) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1082) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1143) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:632) at java.lang.Thread.run(Thread.java:747)2) Thread[id=6175, name=TEST-CollectionsAPIDistributedZkTest.test-seed#[B9E9A6EBF5A9D79B]-EventThread, state=WAITING, group=TGRP-CollectionsAPIDistributedZkTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:178) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2061) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:494) 3) Thread[id=6453, name=zkCallback-946-thread-4, state=TIMED_WAITING, group=TGRP-CollectionsAPIDistributedZkTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:218) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:461) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362) at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:937) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1082) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1143) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:632) at java.lang.Thread.run(Thread.java:747)4) Thread[id=6451, name=zkCallback-946-thread-2, state=TIMED_WAITING, group=TGRP-CollectionsAPIDistributedZkTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:218) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:461) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362) at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:937) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1082) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1143) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:632) at java.lang.Thread.run(Thread.java:747)5) Thread[id=6174, name=TEST-CollectionsAPIDistributedZkTest.test-seed#[B9E9A6EBF5A9D79B]-SendThread(127.0.0.1:48730), state=TIMED_WAITING, group=TGRP-CollectionsAPIDistributedZkTest] at java.lang.Thread.sleep(Native Method) at org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:101) at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:940) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1003) 6) Thread[id=6452, name=zkCallback-946-thread-3, state=TIMED_WAITING, group=TGRP-CollectionsAPIDistributedZkTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:218) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:461) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362) at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:937) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1082) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1143) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:632) at java.lang.Thread.run(Thread.java:747) Stack Trace: com.carrotsearch.randomizedtesting.ThreadLeakError: 6 threads leaked from
[JENKINS] Lucene-Solr-trunk-Solaris (64bit/jdk1.8.0) - Build # 199 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Solaris/199/ Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC 3 tests failed. FAILED: org.apache.solr.schema.TestCloudManagedSchema.test Error Message: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:54701 within 3 ms Stack Trace: org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:54701 within 3 ms at __randomizedtesting.SeedInfo.seed([92C651C6E18112BF:1A926E1C4F7D7F47]:0) at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:181) at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:115) at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:110) at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:97) at org.apache.solr.cloud.AbstractDistribZkTestBase.printLayout(AbstractDistribZkTestBase.java:278) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.distribTearDown(AbstractFullDistribZkTestBase.java:1474) at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:940) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:809) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:460) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:880) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:781) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:816) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:54701 within 3 ms at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:208) at org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:173) ... 37 more FAILED:
[jira] [Updated] (LUCENE-6801) PhraseQuery incorrectly advertises it supports terms at the same position
[ https://issues.apache.org/jira/browse/LUCENE-6801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-6801: - Attachment: LUCENE_6801.patch Here's an updated patch. I added a simple test, ported from one in MultiPhraseQuery. I adjusted the class javadocs of both query classes a bit, as well as their methods that add terms at specified positions. > PhraseQuery incorrectly advertises it supports terms at the same position > - > > Key: LUCENE-6801 > URL: https://issues.apache.org/jira/browse/LUCENE-6801 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Reporter: David Smiley >Priority: Minor > Attachments: LUCENE_6801.patch, LUCENE_6801.patch > > > The following in PhraseQuery has been here since Sept 15th 2004 (by "goller"): > {code:java} > /** > * Adds a term to the end of the query phrase. > * The relative position of the term within the phrase is specified > explicitly. > * This allows e.g. phrases with more than one term at the same position > * or phrases with gaps (e.g. in connection with stopwords). > * > */ > public Builder add(Term term, int position) { > {code} > Of course this isn't true; it's why we have MultiPhraseQuery. Yet we even > allow you to have consecutive terms with the same positions. We shouldn't > allow that; we should throw an exception. For my own sanity, I modified a > simple MultiPhraseQuery test to use PhraseQuery instead and of course it > didn't work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-5.x-Linux (64bit/jdk1.8.0_66) - Build # 14669 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/14669/ Java: 64bit/jdk1.8.0_66 -XX:+UseCompressedOops -XX:+UseG1GC 1 tests failed. FAILED: org.apache.lucene.search.TestGeoPointQuery.testRandomTiny Error Message: Captured an uncaught exception in thread: Thread[id=24, name=T2, state=RUNNABLE, group=TGRP-TestGeoPointQuery] Stack Trace: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=24, name=T2, state=RUNNABLE, group=TGRP-TestGeoPointQuery] at __randomizedtesting.SeedInfo.seed([D7191653E30FA9A2:9E5EC815BD2E910E]:0) Caused by: java.lang.AssertionError at __randomizedtesting.SeedInfo.seed([D7191653E30FA9A2]:0) at org.apache.lucene.search.GeoPointTermsEnum.(GeoPointTermsEnum.java:65) at org.apache.lucene.search.GeoPointDistanceQueryImpl$GeoPointRadiusTermsEnum.(GeoPointDistanceQueryImpl.java:55) at org.apache.lucene.search.GeoPointDistanceQueryImpl.getTermsEnum(GeoPointDistanceQueryImpl.java:44) at org.apache.lucene.search.MultiTermQuery.getTermsEnum(MultiTermQuery.java:318) at org.apache.lucene.search.GeoPointTermQueryConstantScoreWrapper$1.getDocIDs(GeoPointTermQueryConstantScoreWrapper.java:72) at org.apache.lucene.search.GeoPointTermQueryConstantScoreWrapper$1.bulkScorer(GeoPointTermQueryConstantScoreWrapper.java:117) at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.cache(LRUQueryCache.java:598) at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.bulkScorer(LRUQueryCache.java:646) at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69) at org.apache.lucene.search.BooleanWeight.booleanScorer(BooleanWeight.java:198) at org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:239) at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.cache(LRUQueryCache.java:598) at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.bulkScorer(LRUQueryCache.java:646) at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69) at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:818) at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:92) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:535) at org.apache.lucene.util.BaseGeoPointTestCase$VerifyHits.test(BaseGeoPointTestCase.java:496) at org.apache.lucene.util.BaseGeoPointTestCase$2._run(BaseGeoPointTestCase.java:753) at org.apache.lucene.util.BaseGeoPointTestCase$2.run(BaseGeoPointTestCase.java:618) Build Log: [...truncated 8127 lines...] [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery [junit4] IGNOR/A 0.01s J1 | TestGeoPointQuery.testAllLonEqual [junit4]> Assumption #1: 'nightly' test group is disabled (@Nightly()) [junit4] IGNOR/A 0.00s J1 | TestGeoPointQuery.testMultiValued [junit4]> Assumption #1: 'nightly' test group is disabled (@Nightly()) [junit4] IGNOR/A 0.00s J1 | TestGeoPointQuery.testSamePointManyTimes [junit4]> Assumption #1: 'nightly' test group is disabled (@Nightly()) [junit4] IGNOR/A 0.00s J1 | TestGeoPointQuery.testAllLatEqual [junit4]> Assumption #1: 'nightly' test group is disabled (@Nightly()) [junit4] 2> nov 19, 2015 4:09:13 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [junit4] 2> WARNING: Uncaught exception in thread: Thread[T2,5,TGRP-TestGeoPointQuery] [junit4] 2> java.lang.AssertionError [junit4] 2>at __randomizedtesting.SeedInfo.seed([D7191653E30FA9A2]:0) [junit4] 2>at org.apache.lucene.search.GeoPointTermsEnum.(GeoPointTermsEnum.java:65) [junit4] 2>at org.apache.lucene.search.GeoPointDistanceQueryImpl$GeoPointRadiusTermsEnum.(GeoPointDistanceQueryImpl.java:55) [junit4] 2>at org.apache.lucene.search.GeoPointDistanceQueryImpl.getTermsEnum(GeoPointDistanceQueryImpl.java:44) [junit4] 2>at org.apache.lucene.search.MultiTermQuery.getTermsEnum(MultiTermQuery.java:318) [junit4] 2>at org.apache.lucene.search.GeoPointTermQueryConstantScoreWrapper$1.getDocIDs(GeoPointTermQueryConstantScoreWrapper.java:72) [junit4] 2>at org.apache.lucene.search.GeoPointTermQueryConstantScoreWrapper$1.bulkScorer(GeoPointTermQueryConstantScoreWrapper.java:117) [junit4] 2>at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.cache(LRUQueryCache.java:598) [junit4] 2>at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.bulkScorer(LRUQueryCache.java:646) [junit4] 2>at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69) [junit4] 2>at
[jira] [Commented] (LUCENE-6900) Grouping sortWithinGroup should use Sort.RELEVANCE to indicate that, not null
[ https://issues.apache.org/jira/browse/LUCENE-6900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013006#comment-15013006 ] Martijn van Groningen commented on LUCENE-6900: --- No there isn't, what I remember is that when creating this class it was common to use 'null' as an indication to sort by relevancy. > Grouping sortWithinGroup should use Sort.RELEVANCE to indicate that, not null > - > > Key: LUCENE-6900 > URL: https://issues.apache.org/jira/browse/LUCENE-6900 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/grouping >Reporter: David Smiley >Priority: Minor > > In AbstractSecondPassGroupingCollector, {{withinGroupSort}} uses a value of > null to indicate a relevance sort. I think it's nicer to use Sort.RELEVANCE > for this -- after all it's how the {{groupSort}} variable is handled. This > choice is also seen in GroupingSearch; likely some other collaborators too. > [~martijn.v.groningen] is there some wisdom in the current choice that > escapes me? If not I'll post a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keith Laban updated SOLR-8220: -- Attachment: SOLR-8220.patch Added a patch based on [~ichattopadhyaya] latest patch. needs to be perf tested. Theoretical optimization, will skip reading from stored fields if all the requested fields are available in docValues. (changes mostly to DocStreamer) Caveats being: - Cannot optimize if any fields are multi valued. - Cannot optimize for * queries. - Does not cache the document (slower in the long run?) -- How can we cache? using doc.getField, perhaps? or LazyDocument? > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields
[ https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keith Laban updated SOLR-8220: -- Attachment: SOLR-8220.patch reformatted patch to be svn style and cleaned up code from last update > Read field from docValues for non stored fields > --- > > Key: SOLR-8220 > URL: https://issues.apache.org/jira/browse/SOLR-8220 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, > SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, > SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch > > > Many times a value will be both stored="true" and docValues="true" which > requires redundant data to be stored on disk. Since reading from docValues is > both efficient and a common practice (facets, analytics, streaming, etc), > reading values from docValues when a stored version of the field does not > exist would be a valuable disk usage optimization. > The only caveat with this that I can see would be for multiValued fields as > they would always be returned sorted in the docValues approach. I believe > this is a fair compromise. > I've done a rough implementation for this as a field transform, but I think > it should live closer to where stored fields are loaded in the > SolrIndexSearcher. > Two open questions/observations: > 1) There doesn't seem to be a standard way to read values for docValues, > facets, analytics, streaming, etc, all seem to be doing their own ways, > perhaps some of this logic should be centralized. > 2) What will the API behavior be? (Below is my proposed implementation) > Parameters for fl: > - fl="docValueField" > -- return field from docValue if the field is not stored and in docValues, > if the field is stored return it from stored fields > - fl="*" > -- return only stored fields > - fl="+" >-- return stored fields and docValue fields > 2a - would be easiest implementation and might be sufficient for a first > pass. 2b - is current behavior -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org