[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Changed in: mahara Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Fix Released Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Changed in: mahara Status: Incomplete => Fix Committed -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Fix Committed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
Testing this using elasticsearch-php 5.x and elasticsearch 5.x I was able to index things called 'João Jiménez Māori' and search them up again. -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Incomplete Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Changed in: mahara Status: Confirmed => Incomplete -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Incomplete Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
I notice that on my local machine Mahara 17.04+ saves João Jiménez Māori in the title fields as 'João Jiménez Māori' and in the description fields as 'Joo Jimnez Māori' But on cluster machines in 16.10 it saves in the title fields as 'Joo Jimnez M<81>ori' in the description fields as 'Joo Jimnez M<81>ori' If I do a SELECT 'João Jiménez Māori' AS test; The both show the special chars correctly But if I do UPDATE view SET description = 'João Jiménez Māori' where id = 10; My local shows it like local above but cluster shows it like cluster above. So the cluster setup for postgres must be different in the way it handles special utf8 characters ** Changed in: mahara Milestone: 17.10.0 => 18.04.0 -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
Will push this problem out to see if elasticsearch, using elasticsearch- php, fixes things -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
Similar report at bug #1408577 ** Changed in: mahara Milestone: 17.04.0 => 17.10.0 ** No longer affects: mahara/15.10 ** No longer affects: mahara/16.04 ** No longer affects: mahara/16.10 -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
We'll look at this issue when upgrading Elasticsearch to a newer version. Apparently, Elasticsearch can have a few problems with certain languages. -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Changed in: mahara/15.10 Milestone: 15.10.7 => 15.10.8 ** Changed in: mahara/16.04 Milestone: 16.04.5 => 16.04.6 -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Status in Mahara 16.10 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Changed in: mahara/16.10 Milestone: 16.10.2 => 16.10.3 -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Status in Mahara 16.10 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** No longer affects: mahara/15.04 ** No longer affects: mahara/1.9 ** No longer affects: mahara/1.10 -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Status in Mahara 16.10 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Changed in: mahara/16.10 Milestone: 16.10.1 => 16.10.2 -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 1.10 series: Won't Fix Status in Mahara 1.9 series: Won't Fix Status in Mahara 15.04 series: Won't Fix Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Status in Mahara 16.10 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Changed in: mahara/15.04 Status: Confirmed => Won't Fix ** Changed in: mahara/15.04 Milestone: 15.04.10 => None ** Changed in: mahara/15.10 Milestone: 15.10.6 => 15.10.7 ** Changed in: mahara/16.04 Milestone: 16.04.4 => 16.04.5 -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 1.10 series: Won't Fix Status in Mahara 1.9 series: Won't Fix Status in Mahara 15.04 series: Won't Fix Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Status in Mahara 16.10 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Changed in: mahara/16.10 Importance: Undecided => High -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 1.10 series: Won't Fix Status in Mahara 1.9 series: Won't Fix Status in Mahara 15.04 series: Confirmed Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Status in Mahara 16.10 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Also affects: mahara/16.10 Importance: Undecided Status: New ** Also affects: mahara/17.04 Importance: High Status: Confirmed ** No longer affects: mahara/17.04 ** Changed in: mahara/16.10 Milestone: None => 16.10.1 ** Changed in: mahara/16.10 Status: New => Confirmed ** Changed in: mahara Milestone: 16.10.1 => 17.04.0 -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 1.10 series: Won't Fix Status in Mahara 1.9 series: Won't Fix Status in Mahara 15.04 series: Confirmed Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Status in Mahara 16.10 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Changed in: mahara Milestone: 16.10.0 => 16.10.1 -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 1.10 series: Won't Fix Status in Mahara 1.9 series: Won't Fix Status in Mahara 15.04 series: Confirmed Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Changed in: mahara/15.10 Milestone: 15.10.5 => 15.10.6 ** Changed in: mahara/16.04 Milestone: 16.04.3 => 16.04.4 -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 1.10 series: Won't Fix Status in Mahara 1.9 series: Won't Fix Status in Mahara 15.04 series: Confirmed Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Changed in: mahara/15.04 Milestone: 15.04.9 => 15.04.10 -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 1.10 series: Won't Fix Status in Mahara 1.9 series: Won't Fix Status in Mahara 15.04 series: Confirmed Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Changed in: mahara/15.10 Milestone: 15.10.4 => 15.10.5 ** Changed in: mahara/16.04 Milestone: 16.04.2 => 16.04.3 -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 1.10 series: Won't Fix Status in Mahara 1.9 series: Won't Fix Status in Mahara 15.04 series: Confirmed Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Changed in: mahara/15.04 Milestone: 15.04.8 => 15.04.9 -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 1.10 series: Won't Fix Status in Mahara 1.9 series: Won't Fix Status in Mahara 15.04 series: Confirmed Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Changed in: mahara/1.10 Status: Confirmed => Won't Fix -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 1.10 series: Won't Fix Status in Mahara 1.9 series: Won't Fix Status in Mahara 15.04 series: Confirmed Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Changed in: mahara Milestone: 16.04.1 => 16.10.0 -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 1.10 series: Confirmed Status in Mahara 1.9 series: Won't Fix Status in Mahara 15.04 series: Confirmed Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Changed in: mahara/15.10 Milestone: 15.10.3 => 15.10.4 -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 1.10 series: Confirmed Status in Mahara 1.9 series: Won't Fix Status in Mahara 15.04 series: Confirmed Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Changed in: mahara/16.04 Milestone: 16.04.0 => 16.04.1 -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 1.10 series: Confirmed Status in Mahara 1.9 series: Won't Fix Status in Mahara 15.04 series: Confirmed Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Changed in: mahara/1.10 Milestone: 1.10.9 => 1.10.10 ** Changed in: mahara/15.04 Milestone: 15.04.6 => 15.04.7 ** Changed in: mahara/15.10 Milestone: 15.10.2 => 15.10.3 -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 1.10 series: Confirmed Status in Mahara 1.9 series: Won't Fix Status in Mahara 15.04 series: Confirmed Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Changed in: mahara/1.10 Milestone: 1.10.8 => 1.10.9 ** Changed in: mahara/15.04 Milestone: 15.04.5 => 15.04.6 ** Changed in: mahara/15.10 Milestone: 15.10.1 => 15.10.2 -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 1.10 series: Confirmed Status in Mahara 1.9 series: Won't Fix Status in Mahara 15.04 series: Confirmed Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Changed in: mahara/1.9 Status: Confirmed => Won't Fix ** Changed in: mahara/1.9 Milestone: 1.9.9 => None -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 1.10 series: Confirmed Status in Mahara 1.9 series: Won't Fix Status in Mahara 15.04 series: Confirmed Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Changed in: mahara/1.10 Milestone: 1.10.7 => 1.10.8 -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 1.10 series: Confirmed Status in Mahara 1.9 series: Confirmed Status in Mahara 15.04 series: Confirmed Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
I found a couple of links to do with ASCII folding token filter, which may help https://www.elastic.co/guide/en/elasticsearch/reference/1.7/analysis- asciifolding-tokenfilter.html http://stackoverflow.com/questions/28584780/ignore-ascii-characters-on- elasticsearch -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 1.10 series: Confirmed Status in Mahara 1.9 series: Confirmed Status in Mahara 15.04 series: Confirmed Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Changed in: mahara/15.04 Milestone: 15.04.4 => 15.04.5 -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 1.10 series: Confirmed Status in Mahara 1.9 series: Confirmed Status in Mahara 15.04 series: Confirmed Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
** Also affects: mahara/16.04 Importance: Undecided Status: New ** Changed in: mahara/16.04 Milestone: None => 16.04.0 ** Changed in: mahara/15.10 Milestone: 15.10.0 => 15.10.1 ** Changed in: mahara/16.04 Importance: Undecided => High ** Changed in: mahara/16.04 Status: New => Confirmed -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 1.10 series: Confirmed Status in Mahara 1.9 series: Confirmed Status in Mahara 15.04 series: Confirmed Status in Mahara 15.10 series: Confirmed Status in Mahara 16.04 series: Confirmed Bug description: In 15.10 I've added code to "quarantine" records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as "e with an accent over it", all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp
[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters
For testing purposes, here are a few sample words (in page titles, artefact titles, and user names) that have caused Elasticsearch to choke: João Jiménez Māori It's not clear from our situation whether the problem lies in our Elasticsearch setup, or in Mahara's code. I think it may be something peculiar to our server setup because I haven't been able to replicate the problem on my local machine. -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contributors -- please ask on #mahara-dev or mahara.org forum before editing or unsubscribing it! https://bugs.launchpad.net/bugs/1487274 Title: Elasticsearch choking on non-ASCII characters Status in Mahara: Confirmed Status in Mahara 1.10 series: Confirmed Status in Mahara 1.9 series: Confirmed Status in Mahara 15.04 series: Confirmed Status in Mahara 15.10 series: Confirmed Bug description: In 15.10 I've added code to quarantine records that Elasticsearch won't index. That is, if Elasticsearch errors out while processing a batch of records, then I re-try each record individually. And if it errors out while processing one of those individual records, I mark the record as quarantined, and keep it in the search_elasticsearch_queue table. I've backported that to one of our large 15.04 sites, and since then I've taken a look at the data in the records that have caused Elasticsearch to choke. They all contain non-ASCII characters, i.e. Unicode characters. These can be as simple as e with an accent over it, all the way up to exotic ones like emoji and the Unicode snowman. I was not able to replicate this when testing on my local machine, but it is certainly in place on our production servers, and bugs such as Bug 1408577 make me think it's probably also present on some other servers as well. To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions ___ Mailing list: https://launchpad.net/~mahara-contributors Post to : mahara-contributors@lists.launchpad.net Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp