RE: Complete data erasure

2020-04-10 Thread asaba.takan...@fujitsu.com
Hello,

I was off the point.
I want to organize the discussion and suggest feature design.

There are two opinions.
1. COMMIT should not take a long time because errors are more likely to occur.
2. The data area should be released when COMMIT is completed because COMMIT has 
to be an atomic action.

These opinions are correct.
But it is difficult to satisfy them at the same time.
So I suggest that users have the option to choose.
DROP TABLE works as following two patterns:

1. Rename data file to "...del" instead of ftruncate(fd,0).
  After that, bgworker scan the directory and run erase_command.
  (erase_command is command set by user like archive_command.
   For example, shred on Linux.)

2. Run erase_command for data file immediately before ftruncate(fd,0).
  Wait until it completes, then reply COMMIT to the client.
  After that, it is the same as normal processing.

If error of erase_command occurs, it issues WARNING and don't request unlink to 
CheckPointer.
It’s not a security failure because I think that there is a risk when data area 
is returned to OS.

I will implement from pattern 2 because it's more similar to user experience 
than pattern 1.
This method has been pointed out as follows.

>From Stephen
> We certainly can't run external commands during transaction COMMIT, so
> this can't be part of a regular DROP TABLE.

I think it means that error of external commands can't be handled.
If so, it's no problem because I determined behavior after error.
Are there any other problems?

Regards,

--
Takanori Asaba






RE: Conflict handling for COPY FROM

2020-03-26 Thread asaba.takan...@fujitsu.com
Hello Surafel,

From: Surafel Temesgen 
>An error that can be surly handled without transaction rollback can
>be included in error handling but i will like to proceed without binary file
>errors handling for the time being

Thank you.

Also it seems that you apply Alexey's comment.
So I'll mark this patch as ready for commiter.

Regards,

--
Takanori Asaba




RE: ssl passphrase callback

2020-03-19 Thread asaba.takan...@fujitsu.com
Hello Andrew,

From: Andreas Karlsson 
> # Nitpicking
> 
> The certificate expires in 2030 while all other certificates used in
> tests expires in 2046. Should we be consistent?
> 
> There is text in server.crt and server.key, while other certificates and
> keys used in the tests do not have this. Again, should we be consistent?
> 
> Empty first line in
> src/test/modules/ssl_passphrase_callback/t/001_testfunc.pl which should
> probably just be removed or replaced with a shebang.
> 
> There is an extra space between the parentheses in the line below. Does
> that follow our code style for Perl?
> 
> +unless ( ($ENV{with_openssl} || 'no') eq 'yes')
> 
> Missing space after comma in:
> 
> +ok(-e "$ddir/postmaster.pid","postgres started");
> 
> Missing space after comma in:
> 
> +ok(! -e "$ddir/postmaster.pid","postgres not started with bad passphrase");
> 
> Andreas
> 

Trailing space:

220 +   X509v3 Subject Key Identifier:
222 +   X509v3 Authority Key Identifier:

Missing "d"(password?):

121 +/* init hook for SSL, the default sets the passwor callback if appropriate 
*/

Regards,

--
Takanori Asaba




RE: Complete data erasure

2020-03-18 Thread asaba.takan...@fujitsu.com
Hello Tom,

From: asaba.takan...@fujitsu.com 
> Hello Tom,
> 
> From: Tom Lane 
> > Tomas Vondra  writes:
> > > I think it depends how exactly it's implemented. As Tom pointed out in
> > > his message [1], we can't do the erasure itself in the post-commit is
> > > not being able to handle errors. But if the files are renamed durably,
> > > and the erasure happens in a separate process, that could be OK. The
> > > COMMIT may wayt for it or not, that's mostly irrelevant I think.
> >
> > How is requiring a file rename to be completed post-commit any less
> > problematic than the other way?  You still have a non-negligible
> > chance of failure.
> 
> I think that errors of rename(2) listed in [1] cannot occur or can be handled.
> What do you think?
> 
> [1] http://man7.org/linux/man-pages/man2/rename.2.html
> 

I have another idea.
How about managing status of data file like the WAL archiver?
For example,

1. Create a status file "...ready" in a transaction that has DROP TABLE. (not 
rename the data file)
2. Background worker scans the directory that has status file.
3. Rename the status file to "...progress" when the erase of the data file 
starts.
4. Rename the status file to "...done" when the erase of the data file finished.

I think that it's OK because step1 is not post-commit and background worker can 
handle error of the erase.

Regards,

--
Takanori Asaba






RE: Conflict handling for COPY FROM

2020-03-12 Thread asaba.takan...@fujitsu.com
Hello Surafel,

From: Surafel Temesgen  
>>On Fri, Mar 6, 2020 at 11:30 AM mailto:asaba.takan...@fujitsu.com 
>> wrote:
>>I think we need regression test that constraint violating row is returned 
>>back to the caller.
>>How about this?
>
>okay attached is a rebased patch with it 
Thank you very much.
Although it is a small point, it may be better like this:
+70005  27  36  46  56  ->  70005  27  37  47  57

I want to discuss about copy from binary file.
It seems that this patch tries to avoid the error that number of field is 
different .

+   {
+   if (cstate->error_limit > 0 || cstate->ignore_all_error)
+   {
+   ereport(WARNING,
+   
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+errmsg("skipping \"%s\" --- 
row field count is %d, expected %d",
+   
cstate->line_buf.data, (int) fld_count, attr_count)));
+   cstate->error_limit--;
+   goto next_line;
+   }
+   else
+   ereport(ERROR,
+   
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+errmsg("row field count is %d, 
expected %d",
+   (int) 
fld_count, attr_count)));
+
+   }

I checked like this:

postgres=# CREATE TABLE x (
postgres(# a serial UNIQUE,
postgres(# b int,
postgres(# c text not null default 'stuff',
postgres(# d text,
postgres(# e text
postgres(# );
CREATE TABLE
postgres=# COPY x from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 7000425  35  45  55
>> 7000526  36  46  56
>> \.
COPY 2
postgres=# SELECT * FROM x;
   a   | b  | c  | d  | e
---++++
 70004 | 25 | 35 | 45 | 55
 70005 | 26 | 36 | 46 | 56
(2 rows)

postgres=# COPY x TO '/tmp/copyout' (FORMAT binary);
COPY 2
postgres=# CREATE TABLE y (
postgres(# a serial UNIQUE,
postgres(# b int,
postgres(# c text not null default 'stuff',
postgres(# d text
postgres(# );
CREATE TABLE
postgres=# COPY y FROM '/tmp/copyout' WITH (FORMAT binary,ERROR_LIMIT -1);
2020-03-12 16:55:55.457 JST [2319] WARNING:  skipping "" --- row field count is 
5, expected 4
2020-03-12 16:55:55.457 JST [2319] CONTEXT:  COPY y, line 1
2020-03-12 16:55:55.457 JST [2319] WARNING:  skipping "" --- row field count is 
0, expected 4
2020-03-12 16:55:55.457 JST [2319] CONTEXT:  COPY y, line 2
2020-03-12 16:55:55.457 JST [2319] ERROR:  unexpected EOF in COPY data
2020-03-12 16:55:55.457 JST [2319] CONTEXT:  COPY y, line 3, column a
2020-03-12 16:55:55.457 JST [2319] STATEMENT:  COPY y FROM '/tmp/copyout' WITH 
(FORMAT binary,ERROR_LIMIT -1);
WARNING:  skipping "" --- row field count is 5, expected 4
WARNING:  skipping "" --- row field count is 0, expected 4
ERROR:  unexpected EOF in COPY data
CONTEXT:  COPY y, line 3, column a

It seems that the error isn't handled.
'WARNING:  skipping "" --- row field count is 5, expected 4' is correct, 
but ' WARNING:  skipping "" --- row field count is 0, expected 4' is not 
correct.

Also, is it needed to skip the error that happens when input is binary file?
Is the case that each row has different number of field and only specific rows 
are copied occurred?


Regards,

--
Takanori Asaba





RE: Conflict handling for COPY FROM

2020-03-06 Thread asaba.takan...@fujitsu.com
Hello Surafel,

Sorry for my late reply.

From: Surafel Temesgen  
>On Thu, Dec 12, 2019 at 7:51 AM mailto:asaba.takan...@fujitsu.com 
> wrote:
>>2. I have a question about copy meta-command.
>>When I executed copy meta-command, output wasn't displayed.
>>Does it correspond to copy meta-command?
>
>Fixed 
Thank you.

I think we need regression test that constraint violating row is returned back 
to the caller.
How about this?

・ /src/test/regress/expected/copy2.out

@@ -1,5 +1,5 @@
 CREATE TEMP TABLE x (
-   a serial,
+   a serial UNIQUE,
b int,
c text not null default 'stuff',
d text,
@@ -55,6 +55,16 @@ LINE 1: COPY x TO stdout WHERE a = 1;
  ^
 COPY x from stdin WHERE a = 50004;
 COPY x from stdin WHERE a > 60003;
+COPY x from stdin WITH(ERROR_LIMIT 5);
+WARNING:  skipping "70001  22  32" --- missing data for column "d"
+WARNING:  skipping "70002  23  33  43  53  54" --- extra 
data after last expected column
+WARNING:  skipping "70003  24  34  44" --- missing data for column 
"e"
+
+ a|  b| c|  d   |   e
+---++++--
+ 70005 | 27  | 37  |  47  | before trigger fired
+(1 row)
+
 COPY x from stdin WHERE f > 60003;
 ERROR:  column "f" does not exist


・ src/test/regress/sql/copy2.sql

@@ -1,5 +1,5 @@
 CREATE TEMP TABLE x (
-   a serial,
+   a serial UNIQUE,
b int,
c text not null default 'stuff',
d text,
@@ -110,6 +110,15 @@ COPY x from stdin WHERE a > 60003;
 60005  26  36  46  56
 \.

+COPY x from stdin WITH(ERROR_LIMIT 5);
+70001  22  32
+70002  23  33  43  53  54
+70003  24  34  44
+70004  25  35  45  55
+70005  26  36  46  56
+70005  27  37  47  57
+\.
+
 COPY x from stdin WHERE f > 60003;

 COPY x from stdin WHERE a = max(x.b);


Regards,

--
Takanori Asaba




RE: Complete data erasure

2020-02-20 Thread asaba.takan...@fujitsu.com
Hello Tom,

From: Tom Lane 
> Tomas Vondra  writes:
> > I think it depends how exactly it's implemented. As Tom pointed out in
> > his message [1], we can't do the erasure itself in the post-commit is
> > not being able to handle errors. But if the files are renamed durably,
> > and the erasure happens in a separate process, that could be OK. The
> > COMMIT may wayt for it or not, that's mostly irrelevant I think.
> 
> How is requiring a file rename to be completed post-commit any less
> problematic than the other way?  You still have a non-negligible
> chance of failure.

I think that errors of rename(2) listed in [1] cannot occur or can be handled.
What do you think?

[1] http://man7.org/linux/man-pages/man2/rename.2.html


Regards,

--
Takanori Asaba






RE: Complete data erasure

2020-02-20 Thread asaba.takan...@fujitsu.com
Greetings,

From: asaba.takan...@fujitsu.com 
> Hello Stephen,
> 
> From: Stephen Frost 
> > I disagree- it's a feature that's been asked for multiple times and does
> > have value in some situations.
> 
> I'm rethinking the need for this feature although I think that it improves the
> security.
> You said that this feature has value in some situations.
> Could you tell me about that situations?
> 
> Regards,
> 
> --
> Takanori Asaba
> 

I think that the use scene is to ensure that no data remains.
This feature will give users peace of mind.

There is a risk of leakage as long as data remains.
I think that there are some things that users are worried about.
For example, there is a possibility that even if it takes years, attackers 
decrypt encrypted data.
Or some users may be concerned about disk management in cloud environments.
These concerns will be resolved if they can erase data themselves.

I think that this feature is valuable, so I would appreciate your continued 
cooperation.

Regards,

--
Takanori Asaba






RE: Complete data erasure

2020-02-09 Thread asaba.takan...@fujitsu.com
Hello Stephen,

From: Stephen Frost 
> I disagree- it's a feature that's been asked for multiple times and does
> have value in some situations.

I'm rethinking the need for this feature although I think that it improves the 
security.
You said that this feature has value in some situations.
Could you tell me about that situations?

Regards,

--
Takanori Asaba




RE: Complete data erasure

2020-01-21 Thread asaba.takan...@fujitsu.com
Hello Stephen,

Thank you for comment.

From: Stephen Frost 
> Greetings,
> 
> * asaba.takan...@fujitsu.com (asaba.takan...@fujitsu.com) wrote:
> > This feature erases data area just before it is returned to the OS (“erase”
> means that overwrite data area to hide its contents here)
> > because there is a risk that the data will be restored by attackers if it 
> > is returned
> to the OS without being overwritten.
> > The erase timing is when DROP, VACUUM, TRUNCATE, etc. are executed.
> 
> Looking at this fresh, I wanted to point out that I think Tom's right-
> we aren't going to be able to reasonbly support this kind of data
> erasure on a simple DROP TABLE or TRUNCATE.
> 
> > I want users to be able to customize the erasure method for their security
> policies.
> 
> There's also this- but I think what it means is that we'd probably have
> a top-level command that basically is "ERASE TABLE blah;" or similar
> which doesn't operate during transaction commit but instead marks the
> table as "to be erased" and then perhaps "erasure in progress" and then
> "fully erased" (or maybe just back to 'normal' at that point).  Making
> those updates will require the command to perform its own transaction
> management which is why it can't be in a transaction itself but also
> means that the data erasure process doesn't need to be done during
> commit.
> 
> > My idea is adding a new parameter erase_command to postgresql.conf.
> 
> Yeah, I don't think that's really a sensible option or even approach.

I think erase_command can also manage the state of a table.
The exit status of a configured command shows it.( 0 is "fully erased" or 
"normal", 1 is "erasure in progress") 
erase_command is executed not during a transaction but when unlink() is 
executed. 
(for example, after a transaction that has done DROP TABLE)
I think that this shows " to be erased ".

> > When erase_command is set, VACUUM does not truncate a file size to non-zero
> > because it's safer for users to return the entire file to the OS than to 
> > return part
> of it.
> 
> There was discussion elsewhere about preventing VACUUM from doing a
> truncate on a file because of the lock it requires and problems with
> replicas..  I'm not sure where that ended up, but, in general, I don't
> think this feature and VACUUM should really have anything to do with
> each other except for the possible case that a user might be told to
> configure their system to not allow VACUUM to truncate tables if they
> care about this case.

I think that if ftruncate(fd, 0) is executed in VACUUM, 
data area allocated to a file is returned to the OS, so that area must be 
overwritten.

> As mentioned elsewhere, you do also have to consider that the sensitive
> data will end up in the WAL and on replicas.  I don't believe that means
> this feature is without use, but it means that users of this feature
> will also need to understand and be able to address WAL and replicas
> (along with backups and such too, of course).

I see.
I can't think of it right away, but I will deal with it.

Sorry for my late reply.
It takes time to understand email from you because I'm a beginner.
Please point out any mistakes.

Regards,

--
Takanori Asaba




RE: Complete data erasure

2020-01-17 Thread asaba.takan...@fujitsu.com
Hello, Horiguchi-san

Thank you for comment.

At Wed, 15 Jan 2020 03:46 +, "Kyotaro Horiguchi " 
wrote in
> shred(1) or wipe(1) doesn't seem to contribute to the objective on
> journaled or copy-on-write file systems. I'm not sure, but maybe the
> same can be true for read-modify-write devices like SSD. I'm not sure
> about SDelete, but anyway replacing unlink() with something like
> 'system("shred")' leads to siginificant performance degradation.
> 
> man 1 wipe says (https://linux.die.net/man/1/wipe) : (shred has a
> similar note.)
> 
> > NOTE ABOUT JOURNALING FILESYSTEMS AND SOME RECOMMENDATIONS
> (JUNE 2004)
> > Journaling filesystems (such as Ext3 or ReiserFS) are now being used
> > by default by most Linux distributions. No secure deletion program
> > that does filesystem-level calls can sanitize files on such
> > filesystems, because sensitive data and metadata can be written to the
> > journal, which cannot be readily accessed. Per-file secure deletion is
> > better implemented in the operating system.

shred can be used in certain modes of journaled file systems.
How about telling users that they must set the certain mode
if they set shred for erase_command in journaled file systems?
man 1 shred goes on like this:

> In  the  case of ext3 file systems, the above disclaimer applies (and shred 
> is thus
> of limited effectiveness) only in data=journal mode, which journals  file  
> data  in
> addition  to  just metadata.  In both the data=ordered (default) and 
> data=writeback
> modes, shred works as usual.  Ext3 journaling modes can be changed  by  
> adding  the
> data=something  option  to  the  mount  options for a particular file system 
> in the
> /etc/fstab file, as documented in the mount man page (man mount).

As shown above, shred works as usual in both the data=ordered (default) and 
data=writeback modes.
I think data=journal mode is not used in many cases because it degrades 
performance.
Therefore, I think it is enough to indicate that shred cannot be used in 
data=journal mode.

Regards,

--
Takanori Asaba



Complete data erasure

2020-01-14 Thread asaba.takan...@fujitsu.com
Hello hackers,

I want to add the feature to erase data so that it cannot be restored 
because it prevents attackers from stealing data from released data area.

- Background
International security policies require that above threat is taken measures.
It is "Base Protection Profile for Database Management Systems Version 2.12 
(DBMS PP)" [1] based on iso 15408.
If the security is improved, it will be more likely to be adopted by 
security-conscious procurers such as public agencies.

- Feature
This feature erases data area just before it is returned to the OS (“erase” 
means that overwrite data area to hide its contents here) 
because there is a risk that the data will be restored by attackers if it is 
returned to the OS without being overwritten.
The erase timing is when DROP, VACUUM, TRUNCATE, etc. are executed.
I want users to be able to customize the erasure method for their security 
policies.

- Implementation
My idea is adding a new parameter erase_command to postgresql.conf.
The command that users set in this parameter is executed just before 
unlink(path) or ftruncate(fd, 0) is called.
For example, the command is shred on Linux and SDelete on Windows.

When erase_command is set, VACUUM does not truncate a file size to non-zero 
because it's safer for users to return the entire file to the OS than to return 
part of it.
Also, there is no standard tool that overwrites part of a file.
With the above specifications, users can easily and safely use this feature 
using standard tool that overwrites entire file like shred.

Hope to hear your feedback and comments.

[1] https://www.commoncriteriaportal.org/files/ppfiles/pp0088V2b_pdf.pdf
P44 8.1.2

- Threat/Policy
A threat agent may use or manage TSF, bypassing the protection mechanisms of 
the TSF.

- TOE Security Objectives Addressing the Threat/Policy 
The TOE will ensure that any information contained in a protected resource 
within its Scope of Control 
is not inappropriately disclosed when the resource is reallocated.

- Rationale
diminishes this threat by ensuring that TSF data and user data is not persistent
when resources are released by one user/process and allocated to another 
user/process.

TOE: Target of Evaluation
TSF: TOE Security Functionality


Regards

--
Takanori Asaba





RE: Conflict handling for COPY FROM

2019-12-11 Thread asaba.takan...@fujitsu.com
Hello Surafel,

I'm very interested in this patch.
Although I'm a beginner,I would like to participate in the development of 
PostgreSQL.


1. I want to suggest new output format.
In my opinion, it's kind to display description of output and add "line number" 
and "error" to output.
For example,

error lines

line number | first | second | third | error
+---++---+
  1 | 1 | 10 |   0.5 |   UNIQUE
  2 | 2 | 42 |   0.1 |CHECK
  3 | 3 |   NULL | 0 | NOT NULL
(3 rows)

Although only unique or exclusion constraint violation returned back to the 
caller currently, 
I think that column "error" will be useful when it becomes possible to handle 
other types of errors(check, not-null and so on).

If you assume that users re-execute COPY FROM with the output lines as input, 
these columns are obstacles.
Therefore I think that this output format should be displayed only when we set 
new option(for example ERROR_VERBOSE) like "COPY FROM ... ERROR_VERBOSE;".


2. I have a question about copy meta-command.
When I executed copy meta-command, output wasn't displayed.
Does it correspond to copy meta-command?

Regards

--
Asaba Takanori