Re: [HACKERS] Modifying update_attstats of analyze.c for C Strings

2014-07-08 Thread Ashoke
As a follow-up question,

I found some of the varchar column types, in which the histogram_bounds are
not being surrounded in double quotes ( ) even in the default
implementation.
Ex : *c_name* column of *Customer* table

I also found histogram_bounds in which only some strings are surrounded in
double quotes and some are not.
Ex : *c_address *column of* Customer *table

Why are there such inconsistencies? How is this determined?

Thank you.


On Tue, Jul 8, 2014 at 10:52 AM, Ashoke s.ash...@gmail.com wrote:

 Hi,

 I am trying to implement a functionality that is similar to ANALYZE, but
 needs to have different values (the values will be valid and is stored in
 inp-str[][]) for MCV/Histogram Bounds in case the column under
 consideration is varchar (C Strings). I have written a function
 *dummy_update_attstats* with the following changes. Other things remain
 the same as in *update_attstats* of *~/src/backend/commands/analyze.c*


 *---*
 *{*

 * ArrayType  *arry; *
 * if (*
 *strcmp(col_type,varchar) == 0*
 * )*
 * arry = construct_array(stats-stavalues[k],*
 * stats-numvalues[k], *
 * CSTRINGOID,*
 * -2, *
 * false,*
 * 'c'); *
 * else*
 * arry = construct_array(stats-stavalues[k], *
 * stats-numvalues[k],*
 * stats-statypid[k], *
 * stats-statyplen[k],*
 * stats-statypbyval[k], *
 * stats-statypalign[k]);*
 * values[i++] = PointerGetDatum(arry); /* stavaluesN */ }*
   ---

 and I update the hist_values in the appropriate function as:
   ---

 *if (strcmp(col_type,varchar) == 0**)*
 * hist_values[i] = datumCopy(CStringGetDatum(inp-str[i][j]),*
 * false,*
 * -2);*
 *---*

 I tried this based on the following reference :
 http://www.postgresql.org/message-id/attachment/20352/vacattrstats-extend.diff

 My issue is : When I use my way for strings, the MCV/histogram_bounds in
 pg_stats doesn't have double quotes ( ) surrounding string. That is,

 If normal *update_attstats* is used, histogram_bounds for *TPCH
 nation(n_name)* are : *ALGERIA   ,ARGENTINA,...*
 If I use *dummy_update_attstats* as above, histogram_bounds for *TPCH
 nation(n_name)* are : *ALGERIA,ARGENTINA,...*

 This becomes an issue if the string has ',' (commas), like for example in
 *n_comment* column of *nation* table.

 Could someone point out the problem and suggest a solution?

 Thank you.

 --
 Regards,
 Ashoke




-- 
Regards,
Ashoke


Re: [HACKERS] Modifying update_attstats of analyze.c for C Strings

2014-07-08 Thread Ashoke
Ok, I was able to figure out that when strings contained 'spaces',
PostgreSQL appends them with double quotes.


On Tue, Jul 8, 2014 at 12:04 PM, Ashoke s.ash...@gmail.com wrote:

 As a follow-up question,

 I found some of the varchar column types, in which the histogram_bounds
 are not being surrounded in double quotes ( ) even in the default
 implementation.
 Ex : *c_name* column of *Customer* table

 I also found histogram_bounds in which only some strings are surrounded in
 double quotes and some are not.
 Ex : *c_address *column of* Customer *table

 Why are there such inconsistencies? How is this determined?

 Thank you.


 On Tue, Jul 8, 2014 at 10:52 AM, Ashoke s.ash...@gmail.com wrote:

 Hi,

 I am trying to implement a functionality that is similar to ANALYZE, but
 needs to have different values (the values will be valid and is stored in
 inp-str[][]) for MCV/Histogram Bounds in case the column under
 consideration is varchar (C Strings). I have written a function
 *dummy_update_attstats* with the following changes. Other things remain
 the same as in *update_attstats* of *~/src/backend/commands/analyze.c*


 *---*
 *{*

 * ArrayType  *arry; *
 * if (*
 *strcmp(col_type,varchar) == 0*
 * )*
 * arry = construct_array(stats-stavalues[k],*
 * stats-numvalues[k], *
 * CSTRINGOID,*
 * -2, *
 * false,*
 * 'c'); *
 * else*
 * arry = construct_array(stats-stavalues[k], *
 * stats-numvalues[k],*
 * stats-statypid[k], *
 * stats-statyplen[k],*
 * stats-statypbyval[k], *
 * stats-statypalign[k]);*
 * values[i++] = PointerGetDatum(arry); /* stavaluesN */ }*
   ---

 and I update the hist_values in the appropriate function as:
   ---

 *if (strcmp(col_type,varchar) == 0**)*
 * hist_values[i] = datumCopy(CStringGetDatum(inp-str[i][j]),*
 * false,*
 * -2);*
 *---*

 I tried this based on the following reference :
 http://www.postgresql.org/message-id/attachment/20352/vacattrstats-extend.diff

 My issue is : When I use my way for strings, the MCV/histogram_bounds in
 pg_stats doesn't have double quotes ( ) surrounding string. That is,

 If normal *update_attstats* is used, histogram_bounds for *TPCH
 nation(n_name)* are : *ALGERIA   ,ARGENTINA,...*
 If I use *dummy_update_attstats* as above, histogram_bounds for *TPCH
 nation(n_name)* are : *ALGERIA,ARGENTINA,...*

 This becomes an issue if the string has ',' (commas), like for example in
 *n_comment* column of *nation* table.

 Could someone point out the problem and suggest a solution?

 Thank you.

 --
 Regards,
 Ashoke




 --
 Regards,
 Ashoke







-- 
Regards,
Ashoke


[HACKERS] Modifying update_attstats of analyze.c for C Strings

2014-07-07 Thread Ashoke
Hi,

I am trying to implement a functionality that is similar to ANALYZE, but
needs to have different values (the values will be valid and is stored in
inp-str[][]) for MCV/Histogram Bounds in case the column under
consideration is varchar (C Strings). I have written a function
*dummy_update_attstats* with the following changes. Other things remain the
same as in *update_attstats* of *~/src/backend/commands/analyze.c*


*---*
*{*

* ArrayType  *arry;*
* if (*
*strcmp(col_type,varchar) == 0*
*)*
* arry = construct_array(stats-stavalues[k],*
* stats-numvalues[k],*
* CSTRINGOID,*
* -2,*
* false,*
* 'c');*
* else*
* arry = construct_array(stats-stavalues[k],*
* stats-numvalues[k],*
* stats-statypid[k],*
* stats-statyplen[k],*
* stats-statypbyval[k],*
* stats-statypalign[k]);*
* values[i++] = PointerGetDatum(arry); /* stavaluesN */}*
  ---

and I update the hist_values in the appropriate function as:
  ---

*if (strcmp(col_type,varchar) == 0**)*
* hist_values[i] = datumCopy(CStringGetDatum(inp-str[i][j]),*
* false,*
* -2);*
*---*

I tried this based on the following reference :
http://www.postgresql.org/message-id/attachment/20352/vacattrstats-extend.diff

My issue is : When I use my way for strings, the MCV/histogram_bounds in
pg_stats doesn't have double quotes ( ) surrounding string. That is,

If normal *update_attstats* is used, histogram_bounds for *TPCH
nation(n_name)* are : *ALGERIA   ,ARGENTINA,...*
If I use *dummy_update_attstats* as above, histogram_bounds for *TPCH
nation(n_name)* are : *ALGERIA,ARGENTINA,...*

This becomes an issue if the string has ',' (commas), like for example in
*n_comment* column of *nation* table.

Could someone point out the problem and suggest a solution?

Thank you.

-- 
Regards,
Ashoke