[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2018-04-05 Thread Robert Lyon
** Changed in: mahara
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Fix Released

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2018-03-05 Thread Robert Lyon
** Changed in: mahara
   Status: Incomplete => Fix Committed

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Fix Committed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2017-12-28 Thread Robert Lyon
Testing this using elasticsearch-php 5.x and elasticsearch 5.x I was
able to index things called 'João Jiménez Māori' and search them up
again.

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Incomplete

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2017-10-21 Thread Kristina Hoeppner
** Changed in: mahara
   Status: Confirmed => Incomplete

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Incomplete

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2017-09-17 Thread Robert Lyon
I notice that on my local machine Mahara 17.04+ saves

João Jiménez Māori

in the title fields as 'João Jiménez Māori'
and in the description fields as 'Joo Jimnez Māori'

But on cluster machines in 16.10 it saves

in the title fields as 'Joo Jimnez M<81>ori'
in the description fields as 'Joo Jimnez M<81>ori'

If I do a

 SELECT 'João Jiménez Māori' AS test;

The both show the special chars correctly

But if I do

 UPDATE view SET description = 'João Jiménez Māori' where id =
10;

My local shows it like local above but cluster shows it like cluster
above.

So the cluster setup for postgres must be different in the way it
handles special utf8 characters

** Changed in: mahara
Milestone: 17.10.0 => 18.04.0

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2017-09-17 Thread Robert Lyon
Will push this problem out to see if elasticsearch, using elasticsearch-
php, fixes things

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2017-03-26 Thread Kristina Hoeppner
Similar report at bug #1408577

** Changed in: mahara
Milestone: 17.04.0 => 17.10.0

** No longer affects: mahara/15.10

** No longer affects: mahara/16.04

** No longer affects: mahara/16.10

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2017-03-26 Thread Kristina Hoeppner
We'll look at this issue when upgrading Elasticsearch to a newer
version. Apparently, Elasticsearch can have a few problems with certain
languages.

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2016-12-29 Thread Robert Lyon
** Changed in: mahara/15.10
Milestone: 15.10.7 => 15.10.8

** Changed in: mahara/16.04
Milestone: 16.04.5 => 16.04.6

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed
Status in Mahara 16.10 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2016-12-29 Thread Robert Lyon
** Changed in: mahara/16.10
Milestone: 16.10.2 => 16.10.3

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed
Status in Mahara 16.10 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2016-12-12 Thread Kristina Hoeppner
** No longer affects: mahara/15.04

** No longer affects: mahara/1.9

** No longer affects: mahara/1.10

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed
Status in Mahara 16.10 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2016-12-11 Thread Robert Lyon
** Changed in: mahara/16.10
Milestone: 16.10.1 => 16.10.2

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 1.10 series:
  Won't Fix
Status in Mahara 1.9 series:
  Won't Fix
Status in Mahara 15.04 series:
  Won't Fix
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed
Status in Mahara 16.10 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2016-10-24 Thread Robert Lyon
** Changed in: mahara/15.04
   Status: Confirmed => Won't Fix

** Changed in: mahara/15.04
Milestone: 15.04.10 => None

** Changed in: mahara/15.10
Milestone: 15.10.6 => 15.10.7

** Changed in: mahara/16.04
Milestone: 16.04.4 => 16.04.5

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 1.10 series:
  Won't Fix
Status in Mahara 1.9 series:
  Won't Fix
Status in Mahara 15.04 series:
  Won't Fix
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed
Status in Mahara 16.10 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2016-10-20 Thread Robert Lyon
** Changed in: mahara/16.10
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 1.10 series:
  Won't Fix
Status in Mahara 1.9 series:
  Won't Fix
Status in Mahara 15.04 series:
  Confirmed
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed
Status in Mahara 16.10 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2016-10-20 Thread Robert Lyon
** Also affects: mahara/16.10
   Importance: Undecided
   Status: New

** Also affects: mahara/17.04
   Importance: High
   Status: Confirmed

** No longer affects: mahara/17.04

** Changed in: mahara/16.10
Milestone: None => 16.10.1

** Changed in: mahara/16.10
   Status: New => Confirmed

** Changed in: mahara
Milestone: 16.10.1 => 17.04.0

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 1.10 series:
  Won't Fix
Status in Mahara 1.9 series:
  Won't Fix
Status in Mahara 15.04 series:
  Confirmed
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed
Status in Mahara 16.10 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2016-10-20 Thread Robert Lyon
** Changed in: mahara
Milestone: 16.10.0 => 16.10.1

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 1.10 series:
  Won't Fix
Status in Mahara 1.9 series:
  Won't Fix
Status in Mahara 15.04 series:
  Confirmed
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2016-08-07 Thread Robert Lyon
** Changed in: mahara/15.10
Milestone: 15.10.5 => 15.10.6

** Changed in: mahara/16.04
Milestone: 16.04.3 => 16.04.4

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 1.10 series:
  Won't Fix
Status in Mahara 1.9 series:
  Won't Fix
Status in Mahara 15.04 series:
  Confirmed
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2016-08-07 Thread Robert Lyon
** Changed in: mahara/15.04
Milestone: 15.04.9 => 15.04.10

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 1.10 series:
  Won't Fix
Status in Mahara 1.9 series:
  Won't Fix
Status in Mahara 15.04 series:
  Confirmed
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2016-07-10 Thread Robert Lyon
** Changed in: mahara/15.10
Milestone: 15.10.4 => 15.10.5

** Changed in: mahara/16.04
Milestone: 16.04.2 => 16.04.3

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 1.10 series:
  Won't Fix
Status in Mahara 1.9 series:
  Won't Fix
Status in Mahara 15.04 series:
  Confirmed
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2016-07-10 Thread Robert Lyon
** Changed in: mahara/15.04
Milestone: 15.04.8 => 15.04.9

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 1.10 series:
  Won't Fix
Status in Mahara 1.9 series:
  Won't Fix
Status in Mahara 15.04 series:
  Confirmed
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2016-07-07 Thread Robert Lyon
** Changed in: mahara/1.10
   Status: Confirmed => Won't Fix

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 1.10 series:
  Won't Fix
Status in Mahara 1.9 series:
  Won't Fix
Status in Mahara 15.04 series:
  Confirmed
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2016-06-08 Thread Robert Lyon
** Changed in: mahara
Milestone: 16.04.1 => 16.10.0

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 1.10 series:
  Confirmed
Status in Mahara 1.9 series:
  Won't Fix
Status in Mahara 15.04 series:
  Confirmed
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2016-05-01 Thread Robert Lyon
** Changed in: mahara/15.10
Milestone: 15.10.3 => 15.10.4

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 1.10 series:
  Confirmed
Status in Mahara 1.9 series:
  Won't Fix
Status in Mahara 15.04 series:
  Confirmed
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2016-04-27 Thread Robert Lyon
** Changed in: mahara/16.04
Milestone: 16.04.0 => 16.04.1

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 1.10 series:
  Confirmed
Status in Mahara 1.9 series:
  Won't Fix
Status in Mahara 15.04 series:
  Confirmed
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2016-03-22 Thread Robert Lyon
** Changed in: mahara/1.10
Milestone: 1.10.9 => 1.10.10

** Changed in: mahara/15.04
Milestone: 15.04.6 => 15.04.7

** Changed in: mahara/15.10
Milestone: 15.10.2 => 15.10.3

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 1.10 series:
  Confirmed
Status in Mahara 1.9 series:
  Won't Fix
Status in Mahara 15.04 series:
  Confirmed
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2015-11-26 Thread Aaron Wells
** Changed in: mahara/1.10
Milestone: 1.10.8 => 1.10.9

** Changed in: mahara/15.04
Milestone: 15.04.5 => 15.04.6

** Changed in: mahara/15.10
Milestone: 15.10.1 => 15.10.2

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 1.10 series:
  Confirmed
Status in Mahara 1.9 series:
  Won't Fix
Status in Mahara 15.04 series:
  Confirmed
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2015-10-26 Thread Aaron Wells
** Changed in: mahara/1.9
   Status: Confirmed => Won't Fix

** Changed in: mahara/1.9
Milestone: 1.9.9 => None

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 1.10 series:
  Confirmed
Status in Mahara 1.9 series:
  Won't Fix
Status in Mahara 15.04 series:
  Confirmed
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2015-10-18 Thread Robert Lyon
** Changed in: mahara/1.10
Milestone: 1.10.7 => 1.10.8

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 1.10 series:
  Confirmed
Status in Mahara 1.9 series:
  Confirmed
Status in Mahara 15.04 series:
  Confirmed
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2015-10-05 Thread Robert Lyon
I found a couple of links to do with ASCII folding token filter, which
may help

https://www.elastic.co/guide/en/elasticsearch/reference/1.7/analysis-
asciifolding-tokenfilter.html

http://stackoverflow.com/questions/28584780/ignore-ascii-characters-on-
elasticsearch

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 1.10 series:
  Confirmed
Status in Mahara 1.9 series:
  Confirmed
Status in Mahara 15.04 series:
  Confirmed
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2015-10-05 Thread Robert Lyon
** Changed in: mahara/15.04
Milestone: 15.04.4 => 15.04.5

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 1.10 series:
  Confirmed
Status in Mahara 1.9 series:
  Confirmed
Status in Mahara 15.04 series:
  Confirmed
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2015-09-29 Thread Aaron Wells
** Also affects: mahara/16.04
   Importance: Undecided
   Status: New

** Changed in: mahara/16.04
Milestone: None => 16.04.0

** Changed in: mahara/15.10
Milestone: 15.10.0 => 15.10.1

** Changed in: mahara/16.04
   Importance: Undecided => High

** Changed in: mahara/16.04
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 1.10 series:
  Confirmed
Status in Mahara 1.9 series:
  Confirmed
Status in Mahara 15.04 series:
  Confirmed
Status in Mahara 15.10 series:
  Confirmed
Status in Mahara 16.04 series:
  Confirmed

Bug description:
  In 15.10 I've added code to "quarantine" records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as "e with an accent over
  it", all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp


[Mahara-contributors] [Bug 1487274] Re: Elasticsearch choking on non-ASCII characters

2015-08-20 Thread Aaron Wells
For testing purposes, here are a few sample words (in page titles,
artefact titles, and user names) that have caused Elasticsearch to
choke:

João
Jiménez
Māori

It's not clear from our situation whether the problem lies in our
Elasticsearch setup, or in Mahara's code. I think it may be something
peculiar to our server setup because I haven't been able to replicate
the problem on my local machine.

-- 
You received this bug notification because you are a member of Mahara
Contributors, which is subscribed to Mahara.
Matching subscriptions: Subscription for all Mahara Contributors -- please ask 
on #mahara-dev or mahara.org forum before editing or unsubscribing it!
https://bugs.launchpad.net/bugs/1487274

Title:
  Elasticsearch choking on non-ASCII characters

Status in Mahara:
  Confirmed
Status in Mahara 1.10 series:
  Confirmed
Status in Mahara 1.9 series:
  Confirmed
Status in Mahara 15.04 series:
  Confirmed
Status in Mahara 15.10 series:
  Confirmed

Bug description:
  In 15.10 I've added code to quarantine records that Elasticsearch
  won't index. That is, if Elasticsearch errors out while processing a
  batch of records, then I re-try each record individually. And if it
  errors out while processing one of those individual records, I mark
  the record as quarantined, and keep it in the
  search_elasticsearch_queue table.

  I've backported that to one of our large 15.04 sites, and since then
  I've taken a look at the data in the records that have caused
  Elasticsearch to choke. They all contain non-ASCII characters, i.e.
  Unicode characters. These can be as simple as e with an accent over
  it, all the way up to exotic ones like emoji and the Unicode snowman.

  I was not able to replicate this when testing on my local machine, but
  it is certainly in place on our production servers, and bugs such as
  Bug 1408577 make me think it's probably also present on some other
  servers as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mahara/+bug/1487274/+subscriptions

___
Mailing list: https://launchpad.net/~mahara-contributors
Post to : mahara-contributors@lists.launchpad.net
Unsubscribe : https://launchpad.net/~mahara-contributors
More help   : https://help.launchpad.net/ListHelp